Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
Tools

How BLAST works – Concepts, Types, & Methods Explained

Dr. Muniba Faiza
Last updated: August 25, 2023 10:10 am
Dr. Muniba Faiza
Share
7 Min Read
How does BLAST work?
SHARE

BLAST stands for Basic Local Alignment Search Tool. It is a local alignment algorithm-based tool used for aligning multiple sequences and finding similarities or dissimilarities among various species. In this article, we will explain different kinds of BLAST tools and how does BLAST algorithm works.

Contents
blastnblastpblastxtblastntblastxSpecial kinds of BLASTs:MegablastDiscontiguous MegablastPHI BlastRPS BlastHow does Blast work?References

BLAST is a heuristic method which means that it is a dynamic programming algorithm that is faster, efficient but relatively less sensitive.

For BLAST(ing) any sequence, there is a query sequence and a target sequence/database. The query sequence is the sequence for which we want to find out the similarity and the target sequence is a sequence/database against which the query sequence is aligned. Blast returns the output in the form of hit tables that are arranged in decreasing order of matched accession numbers along with their titles, query coverage, sequence identity, score, and an e-value in separate columns. The reliability of the compared sequences is assessed by e-value.

BLAST has different programs to align sequences of nucleotides, proteins, etc. It consists of other multiple BLAST programs, but the basic kinds of BLAST are as follows:

  • blastn

It is a type of blast where the query sequence is a nucleotide and the target sequence is also a nucleotide, i.e., it is a nucleotide against a nucleotide.

  • blastp

Blastp is a protein-to-protein blast where the query sequence is a protein and the target sequence is also a protein.

  • blastx

In this type of blast, the query sequence is a nucleotide sequence and the target is a protein sequence/database. First, the nucleotide sequence is converted into its protein sequence in three reading frames, then it is searched against the protein.

  • tblastn

In tblastn, the query is a protein and the target is a nucleotide sequence/database. Here, the protein sequence is searched against a nucleotide database which is translated to its corresponding proteins. The translation occurs in all reading frames, but the reading frame is only for the conventional 5’ to 3’ site in the databases, therefore, only 3 reading frames are compared.

  • tblastx

It is a type of blast in which the nucleotide sequence is against the nucleotide database but at the protein level. In other words, the nucleotide query and target sequences are translated into their corresponding protein sequences and then aligned together. Both the query and the target are translated in all 6 reading frames.


Special kinds of BLASTs:

  • Megablast

It is very similar to blastn but its advantage over blastn is that in megablast long sequences can be aligned. A large number of sequences having large sizes can be easily aligned using megablast and all the query sequences are concatenated into one large query sequence. It is a greedy algorithm so that it induces gaps during the alignment and hence, similar sequences are not avoided. Megablast due to these features is faster than blastn but less sensitive since it is a greedy algorithm, but it is very useful when a large number of similar sequences are to be aligned in one go.

  • Discontiguous Megablast

It is exactly the opposite of the megablast referred to as a “Highly Dissimilar Megablast”. It is used to find the dissimilar sequences of the query sequence, i.e., paralogs. Here, the user wants to find the paralogs of a gene present in distant species. So, here the output is those sequences that have the least amount of similarity with the query sequence.

  • PSI Blast

Position-specific iterated (PSI) Blast is very sensitive and usually used for protein similarity search. The query sequence is taken and subjected to blastp which results in the formation of a multiple sequence alignment (MSA) of most similar sequences. From this MSA, the pattern that identifies the query and its homologs are taken, then this conserved pattern is subjected to blastp again to filter the database. This process of identifying patterns from MSA, blasting the pattern against the database again creating MSA, and then again identifying a redefined pattern is PSI Blast.

  • PHI Blast

Pattern Hit Initiated (PHI) blast is very similar to PSI Blast but there is not any iteration. It can be used for DNA as well as protein queries.

  • RPS Blast

Reverse Position Specific (RPS) Blast is also similar to PSI Blast which matches the query with a set of conserved domains, HMM profiles, or pre-aligned profiles. In this kind of blast, the query sequence (DNA / protein) is searched against an existing collection of conserved domains, a preconfigured MSA of various genes.


How does Blast work?

Blast is a greedy algorithm that was developed by Altschul et al. [1]. It is similar to FASTA but more efficient. As FASTA uses a ktup parameter, similarly BLAST also uses a window size for proteins and nucleotides. Both assume that good alignments contain short stretches of exact matches. BLAST is an improvisation over FASTA in the sense that it is faster, more sensitive, more statistically significant, and easy to use. There is a threshold in blast known as ‘Minimal Score denoted as ‘S’. It means that whatever the match is between the query and the database it must have a value equal to or greater than S.

BLAST performs the alignment in 3 basic steps:

  • First, Blast applies the word search in which it removes the higher complex regions and then looks for short stretches of a fixed length of the query sequence.
  • Secondly, Blast identifies the exact word matches from the database. Those words which have scored equal to or greater than the threshold (S) are taken for alignment. These obtained alignments are called “Hits”.
  • Lastly, the blast extends the alignment in both directions as an ungapped alignment that stops at the maximum score and inserts a gap.

References

  1. Altschul, S. F. (2001). BLAST algorithm. e LS.
TAGGED:BLASTBLAST algorithmBLAST conceptBLAST types
Share This Article
Facebook Copy Link Print
ByDr. Muniba Faiza
Follow:
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply Cancel reply

You must be logged in to post a comment.

Starting in Bioinformatics? Do This First!
Starting in Bioinformatics? Do This First!
Tips & Tricks
[Editorial] Is it ethical to change the order of authors’ names in a manuscript?
Editorial Opinion
Installing bbtools on Ubuntu
[Tutorial] Installing BBTools on Ubuntu (Linux).
Sequence Analysis Software Tools
wes_data_analysis Whole Exome Sequencing (WES) Data visualization Toolkit
wes_data_analysis: Whole Exome Sequencing (WES) Data visualization Toolkit
Bioinformatics Programming GitHub Python

You Might Also Like

Bioinformatics ProgrammingTools

Perl one-liners for bioinformaticians

December 8, 2015
CheminformaticsSoftwareTools

How to do molecular orbital analysis to find d-orbitals involved in bonding in an organometallic compound?

May 20, 2020
single-cell rna-seq
RNA-seq analysisSoftwareTools

Most widely used web servers/software for single-cell RNA-seq analysis

July 26, 2020
Installing GROMACS on Ubuntu 20.04 with CUDA GPU Support
SoftwareTools

Method-2: Installing GROMACS on Ubuntu 20.04 with CUDA GPU Support

April 4, 2023
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up