Connect with us

Tools

How BLAST works – Concepts, Types, & Methods Explained

Dr. Muniba Faiza

Published

on

How does BLAST work?

BLAST stands for Basic Local Alignment Search Tool. It is a local alignment algorithm-based tool used for aligning multiple sequences and finding similarities or dissimilarities among various species. In this article, we will explain different kinds of BLAST tools and how does BLAST algorithm works.

BLAST is a heuristic method which means that it is a dynamic programming algorithm that is faster, efficient but relatively less sensitive.

For BLAST(ing) any sequence, there is a query sequence and a target sequence/database. The query sequence is the sequence for which we want to find out the similarity and the target sequence is a sequence/database against which the query sequence is aligned. Blast returns the output in the form of hit tables that are arranged in decreasing order of matched accession numbers along with their titles, query coverage, sequence identity, score, and an e-value in separate columns. The reliability of the compared sequences is assessed by e-value.

BLAST has different programs to align sequences of nucleotides, proteins, etc. It consists of other multiple BLAST programs, but the basic kinds of BLAST are as follows:

  • blastn

It is a type of blast where the query sequence is a nucleotide and the target sequence is also a nucleotide, i.e., it is a nucleotide against a nucleotide.

  • blastp

Blastp is a protein-to-protein blast where the query sequence is a protein and the target sequence is also a protein.

  • blastx

In this type of blast, the query sequence is a nucleotide sequence and the target is a protein sequence/database. First, the nucleotide sequence is converted into its protein sequence in three reading frames, then it is searched against the protein.

  • tblastn

In tblastn, the query is a protein and the target is a nucleotide sequence/database. Here, the protein sequence is searched against a nucleotide database which is translated to its corresponding proteins. The translation occurs in all reading frames, but the reading frame is only for the conventional 5’ to 3’ site in the databases, therefore, only 3 reading frames are compared.

  • tblastx

It is a type of blast in which the nucleotide sequence is against the nucleotide database but at the protein level. In other words, the nucleotide query and target sequences are translated into their corresponding protein sequences and then aligned together. Both the query and the target are translated in all 6 reading frames.


Special kinds of BLASTs:

  • Megablast

It is very similar to blastn but its advantage over blastn is that in megablast long sequences can be aligned. A large number of sequences having large sizes can be easily aligned using megablast and all the query sequences are concatenated into one large query sequence. It is a greedy algorithm so that it induces gaps during the alignment and hence, similar sequences are not avoided. Megablast due to these features is faster than blastn but less sensitive since it is a greedy algorithm, but it is very useful when a large number of similar sequences are to be aligned in one go.

  • Discontiguous Megablast

It is exactly the opposite of the megablast referred to as a “Highly Dissimilar Megablast”. It is used to find the dissimilar sequences of the query sequence, i.e., paralogs. Here, the user wants to find the paralogs of a gene present in distant species. So, here the output is those sequences that have the least amount of similarity with the query sequence.

  • PSI Blast

Position-specific iterated (PSI) Blast is very sensitive and usually used for protein similarity search. The query sequence is taken and subjected to blastp which results in the formation of a multiple sequence alignment (MSA) of most similar sequences. From this MSA, the pattern that identifies the query and its homologs are taken, then this conserved pattern is subjected to blastp again to filter the database. This process of identifying patterns from MSA, blasting the pattern against the database again creating MSA, and then again identifying a redefined pattern is PSI Blast.

  • PHI Blast

Pattern Hit Initiated (PHI) blast is very similar to PSI Blast but there is not any iteration. It can be used for DNA as well as protein queries.

  • RPS Blast

Reverse Position Specific (RPS) Blast is also similar to PSI Blast which matches the query with a set of conserved domains, HMM profiles, or pre-aligned profiles. In this kind of blast, the query sequence (DNA / protein) is searched against an existing collection of conserved domains, a preconfigured MSA of various genes.


How does Blast work?

Blast is a greedy algorithm that was developed by Altschul et al. [1]. It is similar to FASTA but more efficient. As FASTA uses a ktup parameter, similarly BLAST also uses a window size for proteins and nucleotides. Both assume that good alignments contain short stretches of exact matches. BLAST is an improvisation over FASTA in the sense that it is faster, more sensitive, more statistically significant, and easy to use. There is a threshold in blast known as ‘Minimal Score denoted as ‘S’. It means that whatever the match is between the query and the database it must have a value equal to or greater than S.

BLAST performs the alignment in 3 basic steps:

  • First, Blast applies the word search in which it removes the higher complex regions and then looks for short stretches of a fixed length of the query sequence.
  • Secondly, Blast identifies the exact word matches from the database. Those words which have scored equal to or greater than the threshold (S) are taken for alignment. These obtained alignments are called “Hits”.
  • Lastly, the blast extends the alignment in both directions as an ungapped alignment that stops at the maximum score and inserts a gap.

References

  1. Altschul, S. F. (2001). BLAST algorithm. e LS.

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Docking

[Tutorial] Performing docking using DockingPie plugin in PyMOL.

Dr. Muniba Faiza

Published

on

[Tutorial] Performing docking using DockingPie plugin in PyMOL.

DockingPie [1] is a PyMOL plugin to perform computational docking within PyMOL [2]. In this article, we will perform simple docking using DockingPie1.2.

(more…)

Continue Reading

Docking

How to install the DockingPie plugin on PyMOL?

Dr. Muniba Faiza

Published

on

How to install DockingPie plugin on PyMOL?

DockingPie [1] is a plugin of PyMOL [2] made to fulfill the purpose of docking within the PyMOL interface. This plugin will allow you to dock using four different algorithms, namely, Vina, RxDock, SMINA, and ADFR. It will also allow you to perform flexible docking. Though the installation procedure is the same for all OSs, in this article, we are installing this plugin on Ubuntu (Linux).

(more…)

Continue Reading

Structural Bioinformatics

How to predict binding pocket/site using CASTp server?

Dr. Muniba Faiza

Published

on

Binding site prediction using CASTp server.

The CASTp server allows you to predict the binding sites in a protein [1]. In this article, we will predict binding sites in a protein using the same.

(more…)

Continue Reading

Software

Video Tutorial: Calculating binding pocket volume using PyVol plugin.

Dr. Muniba Faiza

Published

on

Calculate Binding Pocket Volume in Pymol (using PyVol plugin).

This is a video tutorial for calculating binding pocket volume using the PyVol plugin [1] in Pymol [2].

(more…)

Continue Reading

Software

How to generate topology from SMILES for MD Simulation?

Dr. Muniba Faiza

Published

on

How to generate topology from SMILES for MD Simulation?

If you need to generate the topology of molecules using their SMILES, a simple Python script is available.

(more…)

Continue Reading

Software

[Tutorial] Installing jdock on Ubuntu (Linux).

Dr. Muniba Faiza

Published

on

[Tutorial] Installing jdock on Ubuntu (Linux).

jdock is an extended version of idock [1]. It has the same features as the idock along with some bug fixes. However, the binary name and the GitHub repository names are changed. We are installing jdock on Ubuntu (Linux).

(more…)

Continue Reading

Software

How to install GMXPBSA on Ubuntu (Linux)?

Dr. Muniba Faiza

Published

on

How to install GMXPBSA on Ubuntu (Linux)?

GMXPBSA is a tool to calculate binding free energy [1]. It is compatible with Gromacs version 4.5 and later. In this article, we will install GMXPBSA version 2.1.2 on Ubuntu (Linux).

(more…)

Continue Reading

Docking

[Tutorial] Installing Pyrx on Windows.

Dr. Muniba Faiza

Published

on

[Tutorial] Installing Pyrx on Windows.

Pyrx [1] is another virtual screening software that also offers to perform docking using Autodock Vina. In this article, we will install Pyrx on Windows. (more…)

Continue Reading

MD Simulation

How to solve ‘Could NOT find CUDA: Found unsuitable version “10.1”‘ error during GROMACS installation?

Dr. Muniba Faiza

Published

on

How to solve ‘Could NOT find CUDA: Found unsuitable version “10.1”‘ error during GROMACS installation?

Compiling GROMACS [1] with GPU can be trivial. Previously, we have provided a few articles on the same. In this article, we will solve an error frequently occurring during GROMACS installation.

(more…)

Continue Reading

Software

Installing Autodock4 on MacOS.

Dr. Muniba Faiza

Published

on

Installing Autodock4 on MacOS

Previously, we installed the Autodock suite [1] on Ubuntu. Visit this article for details. Now, let’s install it on MacOS.

(more…)

Continue Reading

Docking

How to install Autodock4 on Ubuntu?

Dr. Muniba Faiza

Published

on

How to install Autodock4 on Ubuntu?

Autodock suite is used for docking small molecules [1]. Recently, Autodock-GPU [2] is developed to accelerate the docking process. Its installation is described in this article. In this tutorial, we will install Autodock 4.2.6 on Ubuntu.

(more…)

Continue Reading

Software

DS Visualizer: Uses & Applications

Dr. Muniba Faiza

Published

on

DS Visualizer: Uses & Applications

Discovery Studio (DS) Visualizer (from BIOVIA) is a visualization tool for viewing, sharing, and analyzing proteins [1]. Here are some uses and applications of DS Visualizer.

(more…)

Continue Reading

Software

Protein structure & folding information exploited from remote homologs.

Dr. Muniba Faiza

Published

on

protein structure & folding prediction using remote homologs

Remote homologs are similar protein structures that share similar functions, but there is no easily detectable sequence similarity in them. A new study has revealed that the protein folding information can be exploited from remote homologous structures. A new tool is developed to recognize such proteins and predict their structure and folding pathway. (more…)

Continue Reading

RNA-seq analysis

Pathonoia- A new tool to detect pathogens in RNA-seq data.

Dr. Muniba Faiza

Published

on

Pathonoia- A new tool to detect pathogens in RNA-seq data.

Detecting viruses and bacteria in RNA-seq data with less false positive rate is a difficult task. A new tool is introduced to detect pathogens in RNA-seq data with high precision and recall known as Pathonoia [1].

(more…)

Continue Reading

Software

AlphaFill- New algorithm to fill ligands in AlphaFold models.

Dr. Muniba Faiza

Published

on

AlphaFill- New algorithm to fill ligands in AlphaFold models.

AlphaFold is a popular artificial intelligence based protein prediction tool [1]. Though it predicts good protein structures, it lacks the capability to predict the small molecules present in the structure such as ligands. For this purpose, AlphaFill is introduced by Hekkelman et al.,[2]. (more…)

Continue Reading

Software

How to calculate binding pocket volume using PyVol plugin in PyMol?

Dr. Muniba Faiza

Published

on

How to calculate binding pocket volume using PyVol plugin in PyMol?

Previously, we provided a tutorial for PyVol plugin [1] installation. In this article, we will calculate the binding pocket volume of protein using the same plugin in PyMol [2]. (more…)

Continue Reading

Software

How to generate electron density map using Pymol?

Dr. Muniba Faiza

Published

on

How to generate electron density map using Pymol?

Electron density maps are available for most of the protein structures in PDB. Therefore, in this article, we are using PDB to generate electron density maps in Pymol.

(more…)

Continue Reading

Software

Installing PyVOL plugin in Pymol on Ubuntu (Linux).

Dr. Muniba Faiza

Published

on

Installing PyVOL plugin in Pymol on Ubuntu (Linux).

PyVOL [1] is an excellent plugin of Pymol [2] for pocket visualization of proteins. In this article, we will install the PyVOL plugin in Pymol on Ubuntu. (more…)

Continue Reading

Tips & Tricks

How to download FASTA sequences from PDB for multiple structures?

Dr. Muniba Faiza

Published

on

How to download FASTA sequences from PDB for multiple structures?

In this article, we are going to download FASTA sequences for multiple structures from PDB [1]. We need to have PDB IDs only for input. (more…)

Continue Reading

Software

How to install Kpax on Ubuntu (Linux)?

Published

on

How to install Kpax on Ubuntu (Linux)?

Kpax is a bioinformatics program to search and align protein structures [1]. It is currently available for Linux platforms only. In this article, we are going to install the latest version of Kpax (5.1.3) on Ubuntu (Linux). (more…)

Continue Reading

LATEST ISSUE

ADVERT