How to blast against a particular set of local sequences (local database)?

in Softwares/Tools by

BLAST [1,2] is a local alignment tool widely used as a preliminary step for the identification of gene or protein functions. The command-line package of NCBI-Blast offers several useful features. These features include making a BLAST database of a set of nucleotide or protein sequences, blast a query sequence against them or all-against-all blast. In this article, these commands are explained. 

The NCBI-Blast+ package [3] is freely accessible and can be downloaded from here. There are both Linux and Windows packages available.

A blast database is required made up of the local sequences in order to blast a single query sequence or multiple sequences. Therefore, to make a blast database, open a terminal and type the following commands.

1. Making BLAST database of local sequences

The input file must consist of sequences in FASTA format.

$ makeblastdb -in input.fasta -parse_seqids -dbtype prot -out blastdb

Here, -parse_seqids is used because it may later help in parsing the sequence ids of the given sequences for further analyses. -in refers to the input file, -dbtype can be protein or nucleotide and -out is the name of the BLAST database to be created. If your input file is present in another directory then provide the complete path.

2. BLAST the local database against a single sequence

$ blastp -db blastdb -query seq.fasta -outfmt 0 -out result.txt -numthreads 4

where, -db is the BLAST database created in the previous step, -query is a file consisting of FASTA sequence, -outfmt is the output format which can be defined in several ways as shown here, and -numthreads refers to the number of CPUs to be used during the search. In the case of nucleotide sequences, use blastn or any other appropriate blast executable.

3. all against all

To BLAST local sequences against the local database created from the same input sequences, the input sequences are used as a query file in FASTA format.

$ blastp -db blastdb -query input.fasta -outfmt 0 -out result.txt -numthreads 4

As you can see in the above command, the database is the same local database created in the first step and the query are the input sequences from which the local database was created in the first place.

If you want to use Windows version, then run the same commands by providing the path to the executables. The installation tutorial will be explained in the upcoming article.

References

  1. Altschul, S. F. (2001). BLAST algorithm. eLS.
  2. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research25(17), 3389-3402.
  3. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics10(1), 421.
Download PDF

Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply