Genomics
Methods to detect the effects of alternative splicing and transcription on proteins
Alternative splicing and the transcription are the most familiar processes amongst the biological processes. Alternative splicing is a process by which various forms of mRNA are generated from the same gene. A gene consists of various exons and introns and the exons are joined together in different ways [1]. This leads to the production of different kind of proteins from the same gene with different forms of mRNA which are known as “transcript variants”, or “splice variants” or “isoforms” (Fig.1) [1].
Fig.1 Alternative Splicing [1].
The proteins produced after the alternative splicing are affected in different ways. As these transcript variants encode for different proteins having different amino acid sequence and hence produce different functions [2]. BLOCKS [3], TM-HMM [4] and InterPro [5] are the most commonly used databases for the protein annotation detection in human and mouse proteins [6,7]. With the help of web tools, conserved regions in the proteins encoded by different splice variants can be easily identified [8], but mapping these regions back on to the gene is very tedious and may cause various error [2].
Addressing the above problems, Mall et al., (2016) has developed a new software known as “ProtAnnot” as a plug-in in the IGB (Integrated Genome Browser) [2]. IGB is a user- friendly genome browser which helps the user to analyze the genomic data and the RNA-seq data [9]. ProtAnnot provides a deep insight into how the transcription and alternative splicing affects the protein and its function [2].
ProtAnnot provides a fast and efficient way to visualize the impact of alternative transcribed proteins and display linked blocks which represent transcript structures and the thickness of the block represents the translated region [2].
Advantages of ProtAnnot:
- it uses a color scheme to show the frame of translation. Exon colors between transcripts can help the user to easily determine whether they encode the same protein or not [2].
- it provides an exon summary which helps the user can easily identify different regions such as sequences that are included due to alternative splicing, promoters, or 3′-end processing [2,10].
- displays protein annotations next to their corresponding transcripts which help the user to identify how different regions of a gene may encode different functions (Fig.2) [2], thereby linking the alternatively transcribed protein function to the respective gene.
- allows saving the search results for later use.
Fig. 2 ProtAnnot visualization of Arabidopsis thaliana gene AT4G36690 encoding splicing regulator U2AF65 [2].
References:
- https://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/MolBioReview/alternative_splicing.html
- Tarun Mall, John Eckstein, David Norris, Hiral Vora, Nowlan H. Freese and Ann E. Loraine. ProtAnnot : an app for Integrated Genome Browser to display how alternative ve splicing and transcription affect proteins. Bioinformatics, 32(16), 2016, 2499–2501. doi: 10.1093/bioinformatics/btw068
- Shmuel Pietrokovski, Jorja G. Henikoff and Steven Henikoff. The Blocks Database—A System for Protein Classification. Nucl. Acids Res. (1996) 24 (1):197-200.doi: 10.1093/nar/24.1.197
- http://www.cbs.dtu.dk/services/TMHMM/
- Alex Mitchell, Hsin-Yu Chang, Louise Daugherty, Matthew Fraser, Sarah Hunter, Rodrigo Lopez, Craig McAnulla, Conor McMenamin, Gift Nuka, Sebastien Pesseat, Amaia Sangrador-Vegas, Maxim Scheremetjew, Claudia Rato, Siew-Yit Yong, Alex Bateman, Marco Punta, Teresa K. Attwood, Christian J.A. Sigrist, Nicole Redaschi, Catherine Rivoire, Ioannis Xenarios, Daniel Kahn, Dominique Guyot, Peer Bork, Ivica Letunic, Julian Gough, Matt Oates, Daniel Haft, Hongzhan Huang, Darren A. Natale, Cathy H. Wu, Christine Orengo, Ian Sillitoe, Huaiyu Mi, Paul D. Thomas and Robert D. Finn (2015). The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research, Jan 2015; doi: 10.1093/nar/gku1243
- Cline,M.S. et al. (2004) The effects of alternative splicing on transmembrane proteins in the mouse genome. Pac. Symp. Biocomput., 17–28.
- Loraine,A.E. et al. (2013) RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. Plant Physiol., 162, 1092–1109.
- Rodriguez,J.M. et al. (2015) APPRIS WebServer and WebServices. Nucleic Acids Res., 43, W455–W459.
- Nicol,J.W. et al. (2009) The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics, 25,2730–2731.
- English,A.C. et al. (2010) Prevalence of alternative splicing choices in Arabidopsis thaliana. BMC Plant Biol., 10, 102.
Genomics
CoolBox- An open-source toolkit for genomic data visualization
A new toolkit called CoolBox is developed for the visual analysis of genomic data [1]. It makes it easy to visualize patterns in a large-scale genomic dataset. (more…)
Genomics
VISPR- A new tool to visualize CRISPR screening experiments
As CRISPR/Cas9 is a well-known genome editing technology, it is important to explore and analyze CRISPR screening experiments. In this article, we discuss a new tool developed for better visualization of CRISPR screening experiments. (more…)
Genomics
How to install Cortex on Ubuntu?
Cortex is a user-friendly framework for genome analysis [1]. It acquires less memory and is quite efficient in performance. It’s installation involves various steps. In this article, we will install Cortex on Ubuntu. (more…)
Genomics
How to Compress and Decompress FASTQ, SAM/BAM & VCF Files using genozip?
genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files [1]. In this article, we explain the usage of the genozip tool for the compression and decompression of these files. (more…)
Genomics
Installing BCFtools on Ubuntu
BCFtools is a set of utilities that are used to manipulate variant call files (VCF) and binary call files (BCF). It can be used for both compressed and uncompressed sort of files. In this article, we will install BCFtools on Ubuntu. (more…)
Genomics
Installing CRISPRCasFinder on Ubuntu
CRISPR/Cas9 is a genome editing technology trending fastly. It is used to identify CRISPR associated genes within the genomes of prokaryotic bacterias. Several tools are available for this. Amongst them, CRISPRCasFinder is one that is used to search for CRISPRs and Cas genes in sequence data [1]. In this article, we will install CRISPRCasFinder on Ubuntu. (more…)
Genomics
Genozip- a new compression tool for VCF files
Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip [1]. (more…)
Genomics
Conventionally unconventional: Anecdote of small RNAs discoveries
Past decade has witnessed an incredible increase in a number of small RNAs. As the name indicates, small RNAs are RNA transcripts of small (approximately 21-24 nucleotide) length [1-8]. These small RNA transcripts regulate various biological processes ranging from a response to biotic/abiotic stress to the determination of tissue specificity [1-8]. Non-coding RNAs are basically classified based on their biogenesis protocol and mode of function. (more…)
Genomics
GenVisR : A tool for genomic visualization
The ever-increasing progress of sequencing techniques has developed a massive amount of genomic data [1]. This has led to an exponential growth of genomic datasets which provide huge information to the scientists. For identifying patterns and investigating biological information, it is necessary to visualize the genomes, but it is quite difficult to develop such tools. (more…)
Genomics
What is PRSice?
Etiology is the study of origination or causation of an event or phenomenon. Genetic etiology is the study of genes responsible for particular traits along with some other genes in an organism. The identification of genetic etiology has become a protocol while studying genotypes and/or phenotypes of individuals. For this, PRS which means, Polygenic Risk Score is calculated. (more…)
Bioinformatics Programming
HTSeq : A Python framework to analyze high throughput sequencing data
High throughput sequencing is most widely used as it saves a lot of time and provide good results, and produces a huge amount of data which is difficult to manage and especially the tasks and operations performed on it are also very difficult. To ease this purpose, a Python framework have been introduced by Simon Anders and team members, this framework is known as “HTSeq”. (more…)
Bioinformatics News
Mycobacteriophages and their potential as source against Mycobacterial active biomolecules
So, today is the great festival of Christmas……! Birthday of The Son of God.. And on this Auspicious day, We want to present before you all the power of Nature… How nature itself provides solution against the problem raised within it….. We all are aware of the epidemics of threat created by Mycobaterium tuberculosis and other related species. But, down here in this article we show how nature provides the solution against it.
As we know Bacteriophage (Bacterio= Bacteria’s, Phage= eater) infects several bacterium species. In contrast to it, a Mycobacteriophage is a member of a group of bacteriophages that infect mycobacterial species as their hosts e.g., Mycobacterium smegmatis and Mycobacterium tuberculosis, the causative agent of tuberculosis.
The rising incidence of tuberculosis, emergence of multi drug resistance in Mycobacterium tuberculosis and a slow progress in finding new drugs makes mycobacteriophage a potential candidate for its use as a diagnostic and therapeutic tool against TB.
All the characterized Mycobacteriophages are double-stranded DNA (dsDNA) tailed phages belonging to the order Caudovirales. Most are of the family Siphoviridae , characterized by long flexible non contractile tails, whereas phages of the family Myoviridae, have contractile tails. There is a notable absence of mycobacteriophages from the family Podoviridae (containing short stubby tails), arising the question whether long tails are needed to traverse the relatively thick mycobacterial cell envelope. dsDNA tailed phages are either temperate, forming stable lysogens with a turbid plaque or lytic, forming clear plaques in which the host cells are killed. Mycobacteriophages can also be studied by the morphology of the plaques which vary in size and shape. Plaque morphology also depends on the burst size, which is the number of phage particles released on the lysis of the infected bacteria.
Since the mycobacterial cell wall consists of a mycolic acid rich Mycobacterial outer membrane, attached to an arabinogalactan layer that is in turn linked to the peptidoglycan, it poses significant challenge to the phages. This challenge is met by a set of proteins, namely Lysin B proteins that cleave the linkage of mycolic acids to the arabinogalactan layer, holins that regulate lysis timing, and the endolysins (LysinAs) that hydrolyze peptidoglycan.
Phages affect hosts with a holin-endolysin system essential for programmed lysis. Endolysin is found to be associated with a protein component of the phage tail involved in facilitating the penetration of the murein during injection of the genome into the host. Holins are small membrane proteins that form holes in the membrane through which the endolysin can pass. Holins control the length of the infective cycle for lytic phages so as to achieve lysis at an optimal time.
Endolysins can be a source of potential antibacterial because of its specificity (targeting only a few strains of bacteria) and thus replacing antibiotics (which have a more wide ranging effect), their low probababilty of developing resistance in Mycobacterium and novel mode of action.
Bioinformatics can assist this particular field of research by finding several other proteins existing on this planet or to prepare other such options having similar pharmacophore (physical and chemical attributes) properties. We can demolish the various disease threats by using natural options provided to us and can remain healthy on this planet. The only point to be remembered for this is,
NATURE CAN SATISFY OUR NEEDS, BUT IT CANNOT SUSTAIN OUR GREED….. AS A HEALTHY BODY CONSISTS OF A HEALTHY MIND, THE SAME WAY.. A CONSERVED PLANET CONSERVES ITS SPECIES TOO…..
(A major part of this article consist of some texts copied from
Hatfull, Graham F. “Mycobacteriophages: genes and genomes.” Annual review of microbiology 64 (2010): 331-356.
for any other information related references and queries, please let us know at [email protected]
Genomics
Roary: Analysis of Prokaryote Pan Genome on a large-scale
The Microbial Pan Genome is the union of genes shared by genomes of interest. This term was first used by Medini in 2005.
Genomics
GenomeD3 plot : Easy visualization of genomes
As the needs say the importance of sequencing of genomes, it is equally important to visualize them. There exists some tools to visualize the genomes,but they are static and standalone, (more…)
You must be logged in to post a comment Login