Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
GenomicsSequence Analysis

The basic concepts of genome assembly

Dr. Muniba Faiza
Last updated: December 10, 2015 5:57 pm
Dr. Muniba Faiza
Share
3 Min Read
SHARE

Genome, as we all know, is a complete set of DNA in an organism including all of its genes. It consists of all the heritable information and also some regions which are not even expressed. Almost 98 % of human genome has been sequenced by the Human Genome Project, only 1 to 2 % has been understood. Still the human genome has to be discovered more whether it would be in terms of genes or proteins. Many sequencing strategies and algorithms have been proposed for genome assembly. Here I want to discuss the basic strategy involved in genome assembly, which sounds quite difficult but is not really complex if understood well.

Basic strategy involved behind discovering the new information of genome is explained in following steps:

  1. First of all, the whole genome of an organism is sequenced which results in thousands or hundreds of different unknown fragments starting from anywhere and ending upto anywhere.
  2. Now, since we don’t know what the sequence is and which fragment should be kept near to which one, the concept for ‘Contigs’ is employed. Contigs are the repeated overlapping reads which are formed when the broken fragments comes to each other only by matching the overlapping regions of the sequence. It means that many fragments which are consecutive are joined to form contig. Many such contigs are formed during the joining process.
  3. Now, the question that arises is how come we know that a fragment which may be a repeat has been kept in its right place as a genome may have many repeated regions? To overcome this, paired ends are used. Paired ends are the two ends of the same sequence fragments which are linked together, so that if one of the end of the fragment is aligned in lets say contig1 then the other end which is a part of the former will also be aligned in the same Contig as it is the consecutive part of the sequence. There are various software with the help of which we can define different lengths of the paired ends.
  4. After that all the Contigs combine to form a scaffold, sometimes called as Metacontigs or Supercontigs, which are then further processed and the genome is sequenced.

All of this is done by different assembly algorithms, mostly used are Velvet and the latest is SPADES.

According to my experiences, more efficient algorithms are which may provide us large information in one go. Just imagine that we got a thread of sequence with unknown base pairs, then what would we do with that thread and how would we identify and extract the useful information from this thread??

Thank you for reading, Don’t forget to share this article if you like it.

Share This Article
Facebook Copy Link Print
ByDr. Muniba Faiza
Follow:
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
3 Comments
  • Fozail says:
    October 17, 2015 at 5:31 am

    I am delighted to read your article about basics of Genome assembly. But I would like to add on some fact over the same as well. You simply drafted out the steps involved in the GA and what methods are being used.
    Apart from what you have mentioned in your article, there exist a number of algorithms/methods being used for Genome assembly. These are

    1. SSAKE
    2. SHARCGS
    3. VCAKE
    4. Newbler
    5. Celera Assembler
    6. Euler
    7. Velvet
    8. ABySS
    9. AllPaths
    10.SOAPdenovo.

    These have been proved much better than other existing versions of Genome assembler, based on K mer and de Bruijn Graph theorem.

    You can read a full text article whose link is provided here, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874646.

    Basically, using the above algorithms two ends of similar thread are overlapped in hierarchical manner and then grouped into contigs and contigs into scaffold, sometimes called as metacontigs or supercontigs. ANd them they are further processed into different format of reads.

    Log in to Reply
  • Muniba Faiza says:
    October 17, 2015 at 4:34 pm

    Yes you are right sir, I forgot to mention the scaffold step because I wanted to give the overview of the genome assembly, and that’s why i didn’t mention much about the algorithms except the latest one, i.e., Spades. But I will mention the scaffold step in it.
    Thank You

    Log in to Reply
  • Fozail says:
    October 17, 2015 at 5:08 pm

    You are most welcome..

    Log in to Reply

Leave a Reply Cancel reply

You must be logged in to post a comment.

Starting in Bioinformatics? Do This First!
Starting in Bioinformatics? Do This First!
Tips & Tricks
[Editorial] Is it ethical to change the order of authors’ names in a manuscript?
Editorial Opinion
Installing bbtools on Ubuntu
[Tutorial] Installing BBTools on Ubuntu (Linux).
Sequence Analysis Software Tools
wes_data_analysis Whole Exome Sequencing (WES) Data visualization Toolkit
wes_data_analysis: Whole Exome Sequencing (WES) Data visualization Toolkit
Bioinformatics Programming GitHub Python

You Might Also Like

GenomicsSoftware

What is PRSice?

May 20, 2020
NGSSequence AnalysisSoftware

ALFALFA explained

December 11, 2015
Cortex - genome analysis framework
GenomicsSoftware

How to install Cortex on Ubuntu?

December 21, 2020
Installing bcftools on Ubuntu
GenomicsSoftwareTools

Installing BCFtools on Ubuntu

August 3, 2024
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up