Connect with us

Sequence Analysis

Aligning DNA reads against a local database using DIAMOND

Published

on

pairwise alignment using DIAMOND

DIAMOND is a program for high throughput pairwise alignment of DNA reads and protein sequences [1]. It is used for the high-performance analysis of large sequence data. In this article, we will make a local database of protein sequences and align protein sequences against the reference database.

Creating a reference database

Keep all FASTA sequences in a file, let’s name it as ‘db.fa’. Now make a reference database of these sequences using the following command.

$ diamond makedb --in db.fa -d nr_db

Here, -d defines the output DIAMOND database file.

If you wish to provide taxonomy features as well, then you can use the following arguments.

--taxonmap <gzip file> to map NCBI protein accession numbers to taxon ids. The gzip file can be downloaded from here.

--taxonnodes <gzip file> to map taxon nodes. The gzip file can be downloaded from here.

--taxonnames <gzip file> to map taxon names. The gzip file can be downloaded from here.

Now, the reference database is created as ‘nr_db.dmnd’.

Aligning DNA reads

Save all DNA reads in FASTA format, let’s name it as ‘dna_reads.fna’. Align the DNA reads pairwise using the ‘blastx’ module of DIAMOND. If you are aligning protein sequences, then use ‘blastp’ instead of ‘blastx’.

$ diamond blastx -d nr_db -q dna_reads.fna -o aligned_reads.m8 --sensitive --outfmt 0

The default output is the BLAST tabular format. You can set the output format, go through the command line options mentioned here. You can set the sensitivity, output format, gap penalty, and other parameters.


References

  1. Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature methods12(1), 59-60.

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Sequence Analysis

HMMER- Uses & Applications

Published

on

hmmer

HMMER [1] is a well-known bioinformatics tool/software. It offers a web server and a command-line tool for users. Here are some additional applications of HMMER. (more…)

Continue Reading

Sequence Analysis

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

Published

on

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

There are commonly used alignment programs such as muscle, blast, clustalx, and so on, that can be easily installed from the repository. In this article, we are going to install such software on Ubuntu 18.04 & 20.04. (more…)

Continue Reading

Sequence Analysis

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Published

on

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Protein sequence analyses include protein similarity, Protein function prediction, protein interactions, and so on. A new feature extraction model is developed for easy analysis of protein sequences. (more…)

Continue Reading

LATEST ISSUE

ADVERT