Connect with us

Sequence Analysis

How To: Detecting Chimera in 16S rRNA Sanger Sequencing Reads

Published

on

A typical chimeric sequence obtained from Pintail

A typical chimeric sequence obtained from Pintail version 1.0

Detecting chimeric (or recombinant) sequences from a sequence dataset is an important part of sequence analysis especially for reconstruction of deep phylogenies as well as for sequence similarity analyses. This article focuses on methods of chimera detection in high quality 16S rRNA sequences from Sanger sequencing with good read length (>750bp). With such large size they become potential candidates for chimera formation. With culture-independent approaches for analyses of microbial diversity picking up fast with high throughput sequencing methods, the amount of chimeric sequences being published in the databases are also increasing exponentially. This is the era of Metagenomics or simply put community DNA analyses where DNA from thousands of species gets pooled up and is then analysed. This further increases chances of chimera formation. Chimeras are usually formed during polymerase chain reaction (PCRs) but in some rare cases they are for real. Therefore, it becomes relevant to adopt methods which can clean the sequence datasets of Chimeras.

Recently, a number of chimera detecting software for 16S rRNA gene sequences have been launched namely Pintail, Mallard and Bellerophon. First two software applications are available at http://www.bioinformatics-toolkit.org and the last one is available at http://greengenes.lbl.gov/cgi-bin/nph-index.cgi. Pintail and Mallard can detect chimeras and anomalies in the 16S rRNA genes based on extent of pair-wise percentage similarity between the query and related sequences. In chimera analysis by Pintail 1.0, the query sequences which could be putative recombinants are compared on a one (query)-on-one (subject) basis with a list of closely related sequences identified by BLAST searches. As Pintail is a one-on-one query-subject comparison, it is highly stringent. This is not the case with Mallard. In Mallard, one of the sequences from within a dataset of query sequences is randomly chosen as subject, while rest remain as query. A many (query)-on-one (subject) comparison follows, which is easy and completes in less time as compared to Pintail. This is to be noted that Mallard is of limited use if the query sequences are too diverse or really novel in the first place.

Another software for detecting chimeras in 16S rRNA genes i.e. Bellerophon ver 3.0 from Greengenes is more dedicated to 16S rRNA sequences. Here, the sequences are required to be submitted as NAST (Nearest Alignment Space Termination Tool) formatted file. The NAST alignment server at Greengenes has more than one million 16S rRNA sequence records. Upon submission of the NAST formatted file, the server launches a localized BLAST search for each query sequence with the 16S rRNA gene sequence library on its server. It

checks for potential chimeras in the respective query-subject alignment, one-on-one. The outcome of the entire process is a couple of EXCEL sheets emailed to the user with the query sequences, their best matches, and BLAST score values. The BLAST score threshold value can be set by the user, below which the software automatically removes the sequences not to be considered for chimera detection. Finally, it tells whether a potential break-point was found or not (in essentially Yes or No format). It is user-friendly and particularly good for large datasets with high amount of sequence diversity. The only demerit of the software is that if there is a relatively novel sequence in the query batch, it receives a low score being highly unrelated with the existing records and thus stands at a risk of getting omitted. Hence, one has to be really careful while using these programs as there could be loss of sequence diversity especially if the data is coming from an extreme site (with more newer/novel sequences) or if the data is coming from some NGS project with nice long reads and good coverage as in the case of Pac Bio Machine. It is worth mentioning here that while Pintail and Mallard can be applied for any given DNA sequence data, Bellerophon is a dedicated program for 16S rRNA.

Dr. Pant is a researcher with keen interest in software driven analysis of DNA/Protein sequence data for taxonomic, phylogenetic and other homology based studies. Currently he is involved in understanding Microbial diversity using Next generation sequencing approaches and Analysis of sequence and metagenomic datasets using computational biology approaches. He is presently engaged with undergraduate teaching as an Assistant Professor in University of Delhi

Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Sequence Analysis

HMMER- Uses & Applications

Published

on

hmmer

HMMER [1] is a well-known bioinformatics tool/software. It offers a web server and a command-line tool for users. Here are some additional applications of HMMER. (more…)

Continue Reading

Sequence Analysis

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

Dr. Muniba Faiza

Published

on

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

There are commonly used alignment programs such as muscle, blast, clustalx, and so on, that can be easily installed from the repository. In this article, we are going to install such software on Ubuntu 18.04 & 20.04. (more…)

Continue Reading

Sequence Analysis

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Published

on

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Protein sequence analyses include protein similarity, Protein function prediction, protein interactions, and so on. A new feature extraction model is developed for easy analysis of protein sequences. (more…)

Continue Reading

Sequence Analysis

Installing RDPTools on Ubuntu (Linux)

Dr. Muniba Faiza

Published

on

Installing RDPTools on Ubuntu

RDP provides analysis tools called RDPTools. These tools are used to high-throughput sequencing data including single-strand, and paired-end reads [1]. In this article, we are going to install RDPTools on Ubuntu (Linux). (more…)

Continue Reading

Sequence Analysis

NGlyAlign- A New Tool to Align Highly Variable Regions in HIV Sequences

Published

on

NGlyAlign: A tool to align Highly Variable Regions in HIV envelope

It is necessary to detect highly variable regions in envelopes of viruses as it allows the establishment of the viruses in the human body. A new tool is developed to build and align the highly variable regions in HIV sequences. (more…)

Continue Reading

Sequence Analysis

How to install ClustalW2 on Ubuntu?

Published

on

Installing clustalw2 command-line tool on Ubuntu

Clustal packages [1,2] are quite useful in multiple sequence alignments. Especially, when you need specific outputs from the command-line. In this article, we will install CustalW2 command-line tool on Ubuntu. (more…)

Continue Reading

Sequence Analysis

Installing HMMER package on Ubuntu

Published

on

Installing hmmer on Ubuntu

HMMER tool is used for searching sequence homologs using profile hidden Markov Models (HMMs) [1]. It is also one of the most widely used alignment tools. In this article, we will install the latest HMMER package on Ubuntu. (more…)

Continue Reading

Sequence Analysis

Installing FASTX-toolkit on Ubuntu

Published

on

Installing FASTX-toolkit on Ubuntu

FASTX-toolkit is a command-line bioinformatics software package for the preprocessing of short reads FASTQ/A files [1]. These files contain multiple short-read sequences obtained as an output of next-generation sequencing. In this article, we are going to install FASTX-toolkit on Ubuntu. (more…)

Continue Reading

Sequence Analysis

Aligning DNA reads against a local database using DIAMOND

Dr. Muniba Faiza

Published

on

pairwise alignment using DIAMOND

DIAMOND is a program for high throughput pairwise alignment of DNA reads and protein sequences [1]. It is used for the high-performance analysis of large sequence data. In this article, we will make a local database of protein sequences and align protein sequences against the reference database. (more…)

Continue Reading

Sequence Analysis

Installing MEME suite on Ubuntu

Published

on

Installing meme suite on ubuntu

MEME suite is used to discover novel motifs in unaligned nucleotide and protein sequences [1,2]. In this article, we will learn how to install MEME on Ubuntu. (more…)

Continue Reading

Sequence Analysis

Installing BLAT- A Pairwise Alignment Tool on Ubuntu

Published

on

Installing BLAT on Ubuntu

BLAT is a pairwise sequence alignment algorithm that is used in the assembly and annotation of the human genome [1]. In this article, we will install BLAT on Ubuntu. (more…)

Continue Reading

Sequence Analysis

Homology search against a local dataset using NCBI-BLAST+ command-line tool

Published

on

NCBI-BLAST+ [1] command-line tool offers multiple functions to be performed on a large dataset of sequences. Previously, we have shown how to blast against a local dataset of sequences. This article will explain the search of homologous sequences for a query sequence against a local database of sequences and how to obtain the top 100 hits out of the searched results. (more…)

Continue Reading

Sequence Analysis

How to use Clustal Omega and MUSCLE command-line tools for multiple sequence alignment?

Dr. Muniba Faiza

Published

on

Clustal Omega [1,2] and MUSCLE are bioinformatics tools that are used for multiple sequence alignment (MSA). In one of our previous articles, we explained the usage of the ClustalW2 command-line tool for MSA and phylogenetic tree construction. In this article, we will use Clustal Omega and MUSCLE for MSA exploring other arguments that facilitate different output formats. (more…)

Continue Reading

Sequence Analysis

Multiple Sequence Alignment and Phylogenetic Tree construction using ClustalW2 command-line tool

Published

on

clustalw2

ClustalW2 is a bioinformatics tool for multiple sequence alignment of DNA or protein sequences. It can easily align sequences and generate a phylogenetic tree online (https://www.genome.jp/tools-bin/clustalw). However, in some cases, we need to perform these operations on a large number of FASTA sequences using the command-line tool of ClustalW2 [1]. (more…)

Continue Reading

Proteomics

Sequence search against a set of local sequences (local database) using phmmer

Published

on

PHMMER is a sequence analysis tool used for protein sequences (http://hmmer.org; version 3.1 b2). It is available online as a web server and as well as a part of the HMMER stand-alone package (http://hmmer.org; version 3.1 b2). HMMER offers various useful features such as multiple sequence alignment including the file format conversion.  (more…)

Continue Reading

Sequence Analysis

Biotite: A bioinformatics framework for sequence and structure data analysis

Dr. Muniba Faiza

Published

on

Sequence and structural data in bioinformatics are ever-increasing and the need for its analysis is ever-demanding likewise. As bioinformaticians analyze the data with their keen knowledge and reach important conclusions, similarly, bioinformaticists provide with the enhanced and advanced tools and software for data analysis. (more…)

Continue Reading

Algorithms

Simulated sequence alignment software: An alternative to MSA benchmarks

Dr. Muniba Faiza

Published

on

In our previous article, we discussed different multiple sequence alignment (MSA) benchmarks to compare and assess the available MSA programs. However, since the last decade, several sequence simulation software have been introduced and are gaining more interest. In this article, we will be discussing various sequence simulating software being used as alternatives to MSA benchmarks. (more…)

Continue Reading

Algorithms

Benchmark databases for multiple sequence alignment: An overview

Dr. Muniba Faiza

Published

on

Multiple sequence alignment (MSA) is a very crucial step in most of the molecular analyses and evolutionary studies. Many MSA programs have been developed so far based on different approaches which attempt to provide optimal alignment with high accuracy. Basic algorithms employed to develop MSA programs include progressive algorithm [1], iterative-based [2], and consistency-based algorithm [3]. Some of the programs incorporate several other methods into the process of creating an optimal alignment such as M-COFFEE [4] and PCMA [5]. (more…)

Continue Reading

Algorithms

SparkBLAST: Introduction

Dr. Muniba Faiza

Published

on

The basic local alignment search tool (BLAST) [1,2] is known for its speed and results, which is also a primary step in sequence analysis. The ever-increasing demand for processing huge amount of genomic data has led to the development of new scalable and highly efficient computational tools/algorithms. For example, MapReduce is the most widely accepted framework which supports design patterns representing general reusable solutions to some problems including biological assembly [3] and is highly efficient to handle large datasets running over hundreds to thousands of processing nodes [4]. But the implementation frameworks of MapReduce (such as Hadoop) limits its capability to process smaller data. (more…)

Continue Reading

Algorithms

Role of Information Theory, Chaos Theory, and Linear Algebra and Statistics in the development of alignment-free sequence analysis

Published

on

By

Sequence alignment is customary to not only find similar regions among a pair of sequences but also to study the structural, functional and evolutionary relationship between organisms. Many tools have been discovered to achieve the goal of alignment of a pair of sequences, separately for nucleotide sequence and amino acid sequence, BLOSSUM & PAM [1] are a few to name. (more…)

Continue Reading

LATEST ISSUE

ADVERT