Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
AlgorithmsSoftware

Basic Concept of Multiple Sequence Alignment

Dr. Muniba Faiza
Last updated: December 30, 2015 11:04 pm
Dr. Muniba Faiza
Share
4 Min Read
SHARE

Multiple Sequence Alignment (MSA) is a very basic step in the phylogeny analysis of organisms. In MSA, all the sequences under study are aligned together pairwise on the basis of similar regions with in them.  The major goal of MSA pairwise alignment is to identify the alignment that maximizes the protein sequence similarity. This is done by seeking an alignment that “maximizes the sum of similarities for all the pair of sequences”, which is called as the ‘Sum-of-scores or SP Score’. The SP Score is the basic of many alignment algorithms.

The most widely used approach for constructing MSA is “Progressive Alignment”, where a set of n proteins are aligned by performing n-1 pairwise alignments of pairs of proteins or pairs of intermediate alignments guided by a phylogeny tree connecting the sequences. A methodology that has been successfully used as an improvement of progressive alignment based on the SP Score is “Consistency-based Scoring”,where the alignment is consistently dependent on the previously obtained alignment, for example, we have 3 sequences namely, A,B, & C ,the pairwise alignment A-B, B-C imply an alignment of A-C which may be different from the directly computed A to C alignment.

Now, the question arises that how much can we rely on the obtained MSA? and how an MSA is validated?

The validation of MSA program typically uses a benchmark data set of reference alignments. An MSA produced by the program is compared with the corresponding reference alignment  which gives an accuracy score.

Before 2004, the standard benchmark was BAliBASE ( Benchmark Alignment dataBASE) , a database of manually refined MSAs consisting of high quality documented alignments to identify the strong and weak points of the numerous alignment programs now available.

“Recently, several new benchmark are made available, namely, OXBENCH, PREFAB, SABmark, IRMBASE and a new extended version of BAliBASE.”

Another parameter which is considered as basic in most of the alignment programs is fM Score. It is used to assess the specificity of an alignment tool and identifies the proportion of matched residues predicted that also appears in the reference alignment. Many of the times, it is encountered that some regions of the sequences are alignable and some are not, however, there are usually also intermediate cases , where sequence and structure have been diverged to a point at which homology is not reliably detectable.In such a case, the fM Score , at best, provides a noisy assessment of alignment tool specificity, that becomes increasingly less reliable as one considers sequences of increasing structural divergence.

However, after considering the reference alignments, the accuracy of results is still questionable as the reference alignments generated are of varying quality.

 

REFERENCES:

  • Multiple sequence alignment

Robert C Edgar1 and Serafim Batzoglou2

  • BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs

Julie D. Thompson, Frédréric Plewniak and Olivier Poch

Share This Article
Facebook Copy Link Print
ByDr. Muniba Faiza
Follow:
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
5 Comments
  • Fozail Ahmad says:
    October 19, 2015 at 5:29 pm

    The wealth of existing methods and their improved similar accuracy has made selection of one tool over the others.
    While discussing the methods, it is worth mentioning that tools like M-Coffee & T-Coffee should be objectively elaborated, so that one can come to know the basic algorithm at Kernel.

    Log in to Reply
  • Muniba Faiza says:
    October 19, 2015 at 6:02 pm

    Thanks for your concern Sir. Actually I didn’t mention about tools because I wanted to represent the basic idea of MSA as simple as possible, otherwise I could have include about the benchmark test approved tools like MUSCLE, MAFFT, T-COFFEE, etc.
    The tools algorithm will be explained in next article regarding MSA.

    Log in to Reply
  • prashant says:
    October 19, 2015 at 7:48 pm

    Is sequence Alignment is done only to find out region of similarity only or also to find out how much the sequences get differed i.e the region of dissimilarity? If yes , why? If No, why?

    Log in to Reply
    • Sanjay_infection_biologist says:
      October 20, 2015 at 10:09 am

      Dear Mr Prashant,

      It totally depends upon the case of study on which you are doing the analysis. For example, imagine a case where you are considering the closely related species for your analysis, then of course you already know that the sequences are going to be mostly same and you would be interested in the regions of dissimilarity/difference in order to get the pattern of their evolution e.g. Hemoglobin of Man, Monkey and Chimpanzee. While, the other case where species are distantly related to each other or are having just similar kind of functions but different structures and then of course you must be interested in finding out the region of similarity between them or there active regions (catalytic domain) e.g. plant hemoglobin, bacterial hemoglobin and bacterial hemoglobin.
      Besides this, if we can relate the sequences, then we can also predict the structure of some proteins because of the principle “Sequence decides structure and structure decides function”.
      for reference you may read:

      for hemoglobin: http://www.bioquest.org/summer2006/The_Evolution_of_Hemoglobin.pdf

      for MSA:

      http://statweb.stanford.edu/~nzhang/345_web/sequence_slides3.pdf

      http://onlinelibrary.wiley.com/doi/10.1002/1097-0134(20000815)40:3%3C502::AID-PROT170%3E3.0.CO;2-Q/pdf

      http://epubs.siam.org/doi/pdf/10.1137/0148063
      http://epubs.siam.org/doi/abs/10.1137/0148063

      http://www.pnas.org/content/86/12/4412.full.pdf

      For further queries, you may contact me on the links provided.

      Thanks,
      Best regards,
      Sanjay

      Log in to Reply
    • Muniba Faiza says:
      October 20, 2015 at 1:15 pm

      Generally sequence alignment is done to find out the similarity between the organisms, but yes we can also find out the dissimilarity in the scenario where we just want to study the differences among the species or to calculate how much the species differ to study variation during evolution or other phylogeny analysis. We can find out the dissimilar sequences with the help of Discontiguous Megablast (a kind of Megablast) and then we can simply align all of them using MUSCLE, CLUSTAL W, etc.

      Log in to Reply

Leave a Reply Cancel reply

You must be logged in to post a comment.

- Advertisement -
Ad image
10 years of Bioinformatics Review: From a Blog to a Bioinformatics Knowledge Hub!
Editorial
Starting in Bioinformatics? Do This First!
Starting in Bioinformatics? Do This First!
Tips & Tricks
[Editorial] Is it ethical to change the order of authors’ names in a manuscript?
Editorial Opinion
Installing bbtools on Ubuntu
[Tutorial] Installing BBTools on Ubuntu (Linux).
Sequence Analysis Software Tools

You Might Also Like

RNA-Seq data analysis
RNA-seq analysisSoftwareTools

Differential Gene Expression Analysis of RNA-Seq data using MeV

August 13, 2020
GenomicsSoftware

GenVisR : A tool for genomic visualization

May 20, 2020
pairwise alignment using DIAMOND
Sequence AnalysisSoftwareTools

Aligning DNA reads against a local database using DIAMOND

September 28, 2020
ProteomicsSequence AnalysisSoftware

Sequence search against a set of local sequences (local database) using phmmer

May 20, 2020
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up