Connect with us

Genomics

GenVisR : A tool for genomic visualization

Dr. Muniba Faiza

Published

on

The ever-increasing progress of sequencing techniques has developed a massive amount of genomic data [1]. This has led to an exponential growth of genomic datasets which provide huge information to the scientists. For identifying patterns and investigating biological information, it is necessary to visualize the genomes, but it is quite difficult to develop such tools.

GenVisR is a Bioconductor R package which provides flexible, user-friendly suite of tools for easy visualization of genomic data. It allows to visualize and interpret genomic data for multiple species under study in three categories: Variants, Copy number alterations and data quality [2]. GenVisR is a compilation of various functions and tools developed for the easy visualization of genomic data.

1. Visualization of Variants

GenVisR provides many functions to analyze the small variants within a genome which is required to be studied during the investigation of the genetic basis of a disease. The available functions in GenVisR to visualize small variants are:

a. Lolliplot

It keeps a precise control over visualization options provided in GenVisR, for example, to visualize the protein domains, a user can opt for Ensemble annotation databases. It also enables the user to plot mutations (Fig.1).

Fig.1

Fig.1 Output from lollipop for selected TCGA breast cancer samples (Cancer Genome Atlas Network, 2012) shows two mutational hotspots in PIK3CA within the accessory and catalytic kinase domains [2].

b. Waterfall

It allows to track the variant recurrence across the multiple genes and illustrates all the mutations in variants and also further differentiates between the variant types. The results are displayed by arranging the samples in a hierarchical manner such that the most recent genes are ranked first and so on (Fig.2).

Fig.2

Fig.2 Output from waterfall showing mutations for five genes across 50 selected TCGA breast cancer samples with mutation type indicated by color in the grid and per sample/gene mutation rates indicated in the top and left sidebars [2].

c. TvTi

It is useful to find the rate of transition and transversion mutations occurred in a set of genes.

2. Visualization of alterations in Copy Number

Copy number alterations within the genome are identified in various diseases [3]. GenVisR provides various functions to easily visualize the copy number alterations.

a. GenCov

It displays the amplifications and deletions within the genomic region of interest (Fig.3).

Fig.3

Fig.3 Output from GenCon displaying coverage (bottom plots) showing focal deletions in sample A (last exon) and B (second intron) within a gene of interest. GC content (top plot) is encoded via a range of colors for each exon [2].

b. cnView

It allows the user to plot copy numbers in a broader view and shows an ideogram for an individual sample at the chromosome level.

c. cnSpec

It displays copy numbers on a larger scale than cnView. It shows a heat map arranged in a grid indexed by chromosomes and samples.

c. cnFreq

It displays the frequency of samples within the genomic dataset which has gained or lost the copy numbers at specific gene loci.

d. lohSpec

Loss of Heterozygosity (loh) is important for studying genomic diseases. The function lohSpec displays all the LOH regions within the genomic dataset (Fig.4).

Fig.4

Fig.4 Output from lohSpec for HCC1395 (Griffith et al., 2015), HCC38 and HCC1143 (Daemen et al., 2013) breast cancer cell lines shows LOH events, across all chromosomes, shaded as dark blue.

3. Visualization of Data Quality

The quality assessment of sequencing data is of utmost importance for the proper interpretation of variants within the genome. GenVisR provides few functions for the quality assessment of the data.

a. covBars

It is a framework which displays the sequencing coverage for the targeted bases (Fig.5).

Fig.5

Fig.5 Output from covBars shows cumulative coverage for 10 samples indicating that for each sample, at least 75% of targeted regions were covered at 35 depth [2].

b. compIdent

It helps to identify the mixed samples that are thought to originate from the same genome (Fig.6).

Fig.6

Fig.6 Output from compIdent for the HCC1395 breast cancer cell line (tumor and normal) shows variant coverage (bottom plot) and SNP allele fraction (main plot) indicating highly related samples [2].

WORKING OF GenVisR:

Since GenVisR is an R package, therefore, it requires a simple R script to run a particular function and it accepts a default file format known as MAF (Mutation Annotation Format). It was first developed for The Cancer Genome Atlas project (Cancer Genome Atlas Research Network, 2008). For example, as illustrated by Z.L.Skidmore et al. (2016), to create Fig. 2 the following script was written in a standard MAF file containing variant mutation data and choosing which genes to plot [2] :

genes ¼ c(“PIK3CA”, “TP53”, “USH2”, “MLL3”, “BRCA1”)

GENVISR::WATERFALL(X ¼ MAF_FILE, PLOTGENES¼GENES)

References:

  1. Kodama,Y. et al. (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res., 40, D54–D56.
  2. Zachary L. Skidmore1 , Alex H. Wagner1 , Robert Lesurf1 , Katie M. Campbell1 , Jason Kunisaki1 , Obi L. Griffith1,2,3,4,* and Malachi Griffith. Bioinformatics, 2016, 1–3 doi: 10.1093/bioinformatics/btw325.
  3. Beroukhim,R. et al. (2010) The landscape of somatic copy-number alteration across human cancers. Nature, 463, 899–905.
How to cite this article: Faiza, M., 2016. GenVisR : A tool for genomic visualization. Bioinformatics Review, 2(7):page 9-13. The article is available at http://bioinformaticsreview.com/20160712/genvisr-a-tool-for-genomic-visualization/

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Genomics

CoolBox- An open-source toolkit for genomic data visualization

Published

on

CoolBox- An open-source toolkit for genomic data visualization

A new toolkit called CoolBox is developed for the visual analysis of genomic data [1]. It makes it easy to visualize patterns in a large-scale genomic dataset. (more…)

Continue Reading

Genomics

VISPR- A new tool to visualize CRISPR screening experiments

Published

on

VISPR- A new tool to visualize CRISPR screening experiments

As CRISPR/Cas9 is a well-known genome editing technology, it is important to explore and analyze CRISPR screening experiments. In this article, we discuss a new tool developed for better visualization of CRISPR screening experiments. (more…)

Continue Reading

Genomics

How to install Cortex on Ubuntu?

Dr. Muniba Faiza

Published

on

Cortex - genome analysis framework

Cortex is a user-friendly framework for genome analysis [1]. It acquires less memory and is quite efficient in performance. It’s installation involves various steps. In this article, we will install Cortex on Ubuntu. (more…)

Continue Reading

Genomics

How to Compress and Decompress FASTQ, SAM/BAM & VCF Files using genozip?

Dr. Muniba Faiza

Published

on

compressing and decompressing files using genozip

genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files [1]. In this article, we explain the usage of the genozip tool for the compression and decompression of these files. (more…)

Continue Reading

Genomics

Installing BCFtools on Ubuntu

Published

on

Installing bcftools on Ubuntu

BCFtools is a set of utilities that are used to manipulate variant call files (VCF) and binary call files (BCF). It can be used for both compressed and uncompressed sort of files. In this article, we will install BCFtools on Ubuntu. (more…)

Continue Reading

Genomics

Installing CRISPRCasFinder on Ubuntu

Dr. Muniba Faiza

Published

on

install crisprcasfinder on ubuntu

CRISPR/Cas9 is a genome editing technology trending fastly. It is used to identify CRISPR associated genes within the genomes of prokaryotic bacterias. Several tools are available for this. Amongst them, CRISPRCasFinder is one that is used to search for CRISPRs and Cas genes in sequence data [1]. In this article, we will install CRISPRCasFinder on Ubuntu. (more…)

Continue Reading

Genomics

Genozip- a new compression tool for VCF files

Published

on

vcf compression tool

Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip [1]. (more…)

Continue Reading

Genomics

Methods to detect the effects of alternative splicing and transcription on proteins

Dr. Muniba Faiza

Published

on

Alternative splicing and the transcription are the most familiar processes amongst the biological processes. Alternative splicing is a process by which various forms of mRNA are generated from the same gene. A gene consists of various exons and introns and the exons are joined together in different ways [1]. (more…)

Continue Reading

Genomics

Conventionally unconventional: Anecdote of small RNAs discoveries

Published

on

Past decade has witnessed an incredible increase in a number of small RNAs. As the name indicates, small RNAs are RNA transcripts of small (approximately 21-24 nucleotide) length [1-8]. These small RNA transcripts regulate various biological processes ranging from a response to biotic/abiotic stress to the determination of tissue specificity [1-8]. Non-coding RNAs are basically classified based on their biogenesis protocol and mode of function. (more…)

Continue Reading

Genomics

What is PRSice?

Dr. Muniba Faiza

Published

on

Etiology is the study of origination or causation of an event or phenomenon. Genetic etiology is the study of genes responsible for particular traits along with some other genes in an organism. The identification of genetic etiology has become a protocol while studying genotypes and/or phenotypes of individuals. For this, PRS which means, Polygenic Risk Score is calculated. (more…)

Continue Reading

Bioinformatics Programming

HTSeq : A Python framework to analyze high throughput sequencing data

Dr. Muniba Faiza

Published

on

High throughput sequencing is most widely used as it saves a lot of time and provide good results, and produces a huge amount of data which is difficult to manage and especially the tasks and operations performed on it are also very difficult. To ease this purpose, a Python framework have been introduced by  Simon Anders and team members, this framework is known as “HTSeq”. (more…)

Continue Reading

Bioinformatics News

Mycobacteriophages and their potential as source against Mycobacterial active biomolecules

Published

on

So, today is the great festival of Christmas……! Birthday of The Son of God.. And on this Auspicious day, We want to present before you all the power of Nature… How nature itself provides solution against the problem raised within it….. We all are aware of the epidemics of threat created by Mycobaterium tuberculosis and other related species. But, down here in this article we show how nature provides the solution against it.

As we know Bacteriophage (Bacterio= Bacteria’s, Phage= eater) infects several bacterium species. In contrast to it, a Mycobacteriophage is a member of a group of bacteriophages that infect mycobacterial species as their hosts e.g.,  Mycobacterium smegmatis and Mycobacterium tuberculosis, the causative agent of tuberculosis.

The rising incidence of tuberculosis, emergence of multi drug resistance in Mycobacterium tuberculosis and a slow progress in finding new drugs makes mycobacteriophage a potential candidate for its use as a diagnostic and therapeutic tool against TB.

All the characterized Mycobacteriophages are double-stranded DNA (dsDNA) tailed phages belonging to the order Caudovirales. Most are of the family Siphoviridae , characterized by  long flexible non contractile tails, whereas phages of the family Myoviridae, have contractile tails. There is a notable absence of mycobacteriophages from the family Podoviridae (containing short stubby tails), arising the question whether long tails are needed to traverse the relatively thick mycobacterial cell envelope. dsDNA tailed phages are either temperate, forming stable lysogens with a turbid plaque or lytic, forming clear plaques in which the host cells are killed. Mycobacteriophages can also be studied by the morphology of the plaques which vary in size and shape. Plaque morphology also depends on the burst size, which is the number of phage particles released on the lysis of the infected bacteria.

Genometrics of 70 sequenced Mycobacteriophages

Since the mycobacterial cell wall consists of a mycolic acid rich Mycobacterial outer membrane, attached to an arabinogalactan layer that is in turn linked to the peptidoglycan, it poses significant challenge to the phages. This challenge is met by a set of proteins, namely Lysin B proteins that cleave the linkage of mycolic acids to the arabinogalactan layer, holins that regulate lysis timing, and the endolysins (LysinAs) that hydrolyze peptidoglycan.

Phages affect hosts with a holin-endolysin system essential for programmed lysis. Endolysin is  found to be associated with a protein component of the phage tail involved in facilitating the penetration of the murein during injection of the genome into the host. Holins are small membrane proteins that form holes in the membrane through which the endolysin can pass. Holins control the length of the infective cycle for lytic phages so as to achieve lysis at an optimal time.

Endolysins can be a source of potential antibacterial because of its specificity (targeting only a few strains of bacteria) and thus replacing antibiotics (which have a more wide ranging effect), their low probababilty of developing resistance in Mycobacterium and novel mode of action.

Bioinformatics can assist this particular field of research by finding several other proteins existing on this planet or to prepare other such options having similar pharmacophore (physical and chemical attributes) properties. We can demolish the various disease threats by using natural options provided to us and can remain healthy on this planet. The only point to be remembered for this is,

NATURE CAN SATISFY OUR NEEDS, BUT IT CANNOT SUSTAIN OUR GREED….. AS A HEALTHY BODY CONSISTS OF A HEALTHY MIND, THE SAME WAY.. A CONSERVED PLANET CONSERVES ITS SPECIES TOO…..

(A major part of this article consist of some texts copied from

Hatfull, Graham F. “Mycobacteriophages: genes and genomes.” Annual review of microbiology 64 (2010): 331-356.

for any other information related references and queries, please let us know at [email protected]

Continue Reading

Genomics

Roary: Analysis of Prokaryote Pan Genome on a large-scale

Dr. Muniba Faiza

Published

on

The Microbial Pan Genome is the union of genes shared by genomes of interest. This term was first used by Medini in 2005.

(more…)

Continue Reading

Genomics

GenomeD3 plot : Easy visualization of genomes

Dr. Muniba Faiza

Published

on

As the needs say the importance of sequencing of genomes, it is equally important to visualize them. There exists some tools to visualize the genomes,but they are static and standalone, (more…)

Continue Reading

Genomics

The basic concepts of genome assembly

Dr. Muniba Faiza

Published

on

Genome, as we all know, is a complete set of DNA in an organism including all of its genes. It consists of all the heritable information and also some regions which are not even expressed. (more…)

Continue Reading

LATEST ISSUE

ADVERT