BLAST stands for Basic Local Alignment Search Tool. It is a local alignment algorithm-based tool used for aligning multiple sequences and finding similarities or dissimilarities among various species. In this article, we will explain different kinds of BLAST tools and how does BLAST algorithm works.
BLAST is a heuristic method which means that it is a dynamic programming algorithm that is faster, efficient but relatively less sensitive.
For BLAST(ing) any sequence, there is a query sequence and a target sequence/database. The query sequence is the sequence for which we want to find out the similarity and the target sequence is a sequence/database against which the query sequence is aligned. Blast returns the output in the form of hit tables that are arranged in decreasing order of matched accession numbers along with their titles, query coverage, sequence identity, score, and an e-value in separate columns. The reliability of the compared sequences is assessed by e-value.
BLAST has different programs to align sequences of nucleotides, proteins, etc. It consists of other multiple BLAST programs, but the basic kinds of BLAST are as follows:
It is a type of blast where the query sequence is a nucleotide and the target sequence is also a nucleotide, i.e., it is a nucleotide against a nucleotide.
Blastp is a protein-to-protein blast where the query sequence is a protein and the target sequence is also a protein.
In this type of blast, the query sequence is a nucleotide sequence and the target is a protein sequence/database. First, the nucleotide sequence is converted into its protein sequence in three reading frames, then it is searched against the protein.
In tblastn, the query is a protein and the target is a nucleotide sequence/database. Here, the protein sequence is searched against a nucleotide database which is translated to its corresponding proteins. The translation occurs in all reading frames, but the reading frame is only for the conventional 5’ to 3’ site in the databases, therefore, only 3 reading frames are compared.
It is a type of blast in which the nucleotide sequence is against the nucleotide database but at the protein level. In other words, the nucleotide query and target sequences are translated into their corresponding protein sequences and then aligned together. Both the query and the target are translated in all 6 reading frames.
Special kinds of BLASTs:
It is very similar to blastn but its advantage over blastn is that in megablast long sequences can be aligned. A large number of sequences having large sizes can be easily aligned using megablast and all the query sequences are concatenated into one large query sequence. It is a greedy algorithm so that it induces gaps during the alignment and hence, similar sequences are not avoided. Megablast due to these features is faster than blastn but less sensitive since it is a greedy algorithm, but it is very useful when a large number of similar sequences are to be aligned in one go.
It is exactly the opposite of the megablast referred to as a “Highly Dissimilar Megablast”. It is used to find the dissimilar sequences of the query sequence, i.e., paralogs. Here, the user wants to find the paralogs of a gene present in distant species. So, here the output is those sequences that have the least amount of similarity with the query sequence.
- PSI Blast
Position-specific iterated (PSI) Blast is very sensitive and usually used for protein similarity search. The query sequence is taken and subjected to blastp which results in the formation of a multiple sequence alignment (MSA) of most similar sequences. From this MSA, the pattern that identifies the query and its homologs are taken, then this conserved pattern is subjected to blastp again to filter the database. This process of identifying patterns from MSA, blasting the pattern against the database again creating MSA, and then again identifying a redefined pattern is PSI Blast.
Pattern Hit Initiated (PHI) blast is very similar to PSI Blast but there is not any iteration. It can be used for DNA as well as protein queries.
Reverse Position Specific (RPS) Blast is also similar to PSI Blast which matches the query with a set of conserved domains, HMM profiles, or pre-aligned profiles. In this kind of blast, the query sequence (DNA / protein) is searched against an existing collection of conserved domains, a preconfigured MSA of various genes.
How does Blast work?
Blast is a greedy algorithm that was developed by Altschul et al. . It is similar to FASTA but more efficient. As FASTA uses a ktup parameter, similarly BLAST also uses a window size for proteins and nucleotides. Both assume that good alignments contain short stretches of exact matches. BLAST is an improvisation over FASTA in the sense that it is faster, more sensitive, more statistically significant, and easy to use. There is a threshold in blast known as ‘Minimal Score denoted as ‘S’. It means that whatever the match is between the query and the database it must have a value equal to or greater than S.
BLAST performs the alignment in 3 basic steps:
- First, Blast applies the word search in which it removes the higher complex regions and then looks for short stretches of a fixed length of the query sequence.
- Secondly, Blast identifies the exact word matches from the database. Those words which have scored equal to or greater than the threshold (S) are taken for alignment. These obtained alignments are called “Hits”.
- Lastly, the blast extends the alignment in both directions as an ungapped alignment that stops at the maximum score and inserts a gap.
- Altschul, S. F. (2001). BLAST algorithm. e LS.
Video Tutorial: Calculating binding pocket volume using PyVol plugin.
How to generate topology from SMILES for MD Simulation?
[Tutorial] Installing jdock on Ubuntu (Linux).
jdock is an extended version of idock . It has the same features as the idock along with some bug fixes. However, the binary name and the GitHub repository names are changed. We are installing jdock on Ubuntu (Linux).
How to install GMXPBSA on Ubuntu (Linux)?
[Tutorial] Installing Pyrx on Windows.
How to solve ‘Could NOT find CUDA: Found unsuitable version “10.1”‘ error during GROMACS installation?
Compiling GROMACS  with GPU can be trivial. Previously, we have provided a few articles on the same. In this article, we will solve an error frequently occurring during GROMACS installation.
Installing Autodock4 on MacOS.
How to install Autodock4 on Ubuntu?
DS Visualizer: Uses & Applications
Protein structure & folding information exploited from remote homologs.
Remote homologs are similar protein structures that share similar functions, but there is no easily detectable sequence similarity in them. A new study has revealed that the protein folding information can be exploited from remote homologous structures. A new tool is developed to recognize such proteins and predict their structure and folding pathway. (more…)
Pathonoia- A new tool to detect pathogens in RNA-seq data.
Detecting viruses and bacteria in RNA-seq data with less false positive rate is a difficult task. A new tool is introduced to detect pathogens in RNA-seq data with high precision and recall known as Pathonoia .
AlphaFill- New algorithm to fill ligands in AlphaFold models.
AlphaFold is a popular artificial intelligence based protein prediction tool . Though it predicts good protein structures, it lacks the capability to predict the small molecules present in the structure such as ligands. For this purpose, AlphaFill is introduced by Hekkelman et al.,. (more…)
How to calculate binding pocket volume using PyVol plugin in PyMol?
How to generate electron density map using Pymol?
Installing PyVOL plugin in Pymol on Ubuntu (Linux).
How to download FASTA sequences from PDB for multiple structures?
How to install Kpax on Ubuntu (Linux)?
Kpax is a bioinformatics program to search and align protein structures . It is currently available for Linux platforms only. In this article, we are going to install the latest version of Kpax (5.1.3) on Ubuntu (Linux). (more…)
How to run do_dssp command (mkdssp) in Gromacs 2022?
In the latest version of GROMACS (2022) , there are some issues regarding the gmx do_dssp command. Apparently, this command either does not run displaying a fatal error, or if it runs then it does not read any frame from MD simulation files. In this article, we are going to run the same command for GROMACS 2022. (more…)