Connect with us

Sequence Analysis

Homology search against a local dataset using NCBI-BLAST+ command-line tool

Tariq Abdullah

Published

on

NCBI-BLAST+ [1] command-line tool offers multiple functions to be performed on a large dataset of sequences. Previously, we have shown how to blast against a local dataset of sequences. This article will explain the search of homologous sequences for a query sequence against a local database of sequences and how to obtain the top 100 hits out of the searched results.

For performing homology search against a local database, follow the steps given below:

  1. Install NCBI-BLAST+ on Ubuntu

Open a terminal (Ctrl+Alt+T) and type the following command:

$ sudo apt-get install ncbi-blast

2. Make BLAST database of your sequences

$ makeblastdb -in input.fasta -parse_seqids -dbtype prot -out blastdb

The details of these arguments are given in the previous article.

We have used blastp since we are demonstrating for protein sequences. You can use blastn if you are working on nucleotide sequences and define in dbtype as -dbtype nucl.

3. Perform homology search

$ blastp -query query.fasta -db blastdb -outfmt '6 sseqid' -max_target_seqs 100 -out homologousids.txt

Here, -query defines the input query sequence saved in a file ‘query.fasta’, 

-dbis the local BLAST database

-outfmt defines the output format. ‘6 sseqid’ means Subject Seq-id in a tabular format.

-max_target_seqs is used to define the number of hits to get in output, here it’s set to 100. You can set it to any number.

-out defines the output filename.

This command will result in a simple text file containing the sequence ids of all the homologous sequences.

4. Extract sequences of those homologous sequence ids.

In this step, we will obtain the sequences of all homologous sequence ids from the constructed local database. This can be achieved by using the blastdbcmd binary of the NCBI-BLAST+ package.

$ blastdbcmd -db blastdb -entry_batch homologousids.txt -out homlogseqs.fasta -outfmt %f

Here, -entry_batch is used for batch processing. Each entry should be in a single line and should begin with sequence ID and then followed by any other character/specifier.

-outfmt %f means output in FASTA format.

There are several other output formats. To read in detail, click here.

The output file (homologseqs.fasta) will be consisting of the top 100 hits of homology search.

References

  1. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics10(1), 421.

Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Sequence Analysis

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

Dr. Muniba Faiza

Published

on

Easy installation of some alignment software on Ubuntu (Linux) 18.04 & 20.04

There are commonly used alignment programs such as muscle, blast, clustalx, and so on, that can be easily installed from the repository. In this article, we are going to install such software on Ubuntu 18.04 & 20.04. (more…)

Continue Reading

Sequence Analysis

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Tariq Abdullah

Published

on

FEGS- A New Feature Extraction Model for Protein Sequence Analysis

Protein sequence analyses include protein similarity, Protein function prediction, protein interactions, and so on. A new feature extraction model is developed for easy analysis of protein sequences. (more…)

Continue Reading

Sequence Analysis

Installing RDPTools on Ubuntu (Linux)

Dr. Muniba Faiza

Published

on

Installing RDPTools on Ubuntu

RDP provides analysis tools called RDPTools. These tools are used to high-throughput sequencing data including single-strand, and paired-end reads [1]. In this article, we are going to install RDPTools on Ubuntu (Linux). (more…)

Continue Reading

LATEST ISSUE

ADVERT