GenVisR : A tool for genomic visualization

in Genomics/Softwares by

The ever increasing progress of sequencing techniques has developed a massive amount of genomic data [1]. This has led to an exponential growth of genomic datasets which provide huge information to the scientists. For identifying patterns and investigating biological information, it is necessary to visualize the genomes, but it is quiet difficult to develop such tools.

GenVisR is a Bioconductor R package which provides flexible, user-friendly suite of tools for easy visualization of genomic data. It allows to visualize and interpret genomic data for multiple species under study in three categories: Variants, Copy number alterations and data quality [2]. GenVisR is a compilation of various functions and tools developed for the easy visualization of genomic data.

1. Visualization of Variants

GenVisR provide many functions to analyze the small variants with in a genome which is required to be studied during the investigation of genetic basis of a disease. The available functions in GenVisR to visualize small variants are:

a. Lolliplot

It keeps a precise control over visualization options provided in GenVisR, for example, to visualize the protein domains, user can opt for Ensemble annotation databases. It also enables the user to plot mutations (Fig.1).

Fig.1

Fig.1 Output from lolliplot for selected TCGA breast cancer samples (Cancer Genome Atlas Network, 2012) shows two mutational hotspots in PIK3CA within the accessory and catalytic kinase domains [2].

b. Waterfall

It allows to track the variant recurrence across the multiple genes and illustrates all the mutations in variants and also further differentiates between the variant types. The results are displayed by arranging the samples in an hierarchical manner such that the  most recently genes are ranked first and so on (Fig.2).

Fig.2

Fig.2 Output from waterfall showing mutations for five genes across 50 selected TCGA breast cancer samples with mutation type indicated by colour in the grid and per sample/gene mutation rates indicated in the top and left sidebars [2].

c. TvTi

It is useful to find the rate of transition and transversion mutations occurred in a set of genes.

2. Visualization of alterations in Copy Number

Copy number alterations within the genome are identified in various diseases [3]. GenVisR provides various functions to easily visualize the copy number alterations.

a. GenCov

It display the amplifications and deletions within the genomic region of interest (Fig.3).

Fig.3

Fig.3 Output from genCov displaying coverage (bottom plots) showing focal deletions in sample A (last exon) and B (second intron) within a gene of interest. GC content (top plot) is encoded via a range of colours for each exon [2].

b. cnView

It allows the user to plot copy numbers in a broader view and shows an ideogram for an individual sample at the chromosome level.

c. cnSpec

It displays copy numbers on a larger scale than cnView. It shows a heatmap arranged in a grid indexed by chromosomes and samples.

c. cnFreq

It displays the frequency of samples with in the genomic dataset which have gained or lost the copy numbers at specific gene loci.

d. lohSpec

Loss of Heterozygosity (loh) is important for studying genomic diseases. The function lohSpec displays all the LOH regions with in the genomic dataset (Fig.4).

Fig.4

Fig.4 Output from lohSpec for HCC1395 (Griffith et al., 2015), HCC38 and HCC1143 (Daemen et al., 2013) breast cancer cell lines shows LOH events, across all chromosomes, shaded as dark blue.

3. Visualization of Data Quality

The quality assessment of sequencing data is of utmost importance for the proper interpretation of variants within the genome. GenVisR provides few functions for the quality assessment of the data.

a. covBars

It is a framework which display the sequencing coverage for the targeted bases (Fig.5).

Fig.5

Fig.5 Output from covBars shows cumulative coverage for 10 samples indicating that for each sample, at least 75% of targeted regions were covered at 35 depth [2].

b. compIdent

It helps to identify the mixed samples that are thought to originate from the same genome (Fig.6).

Fig.6

Fig.6 Output from compIdent for the HCC1395 breast cancer cell line (tumor and normal) shows variant coverage (bottom plot) and SNP allele fraction (main plot) indicating highly related samples [2].

WORKING OF GenVisR:

Since GenVisR is a R package therefore it requires a simple R script to run a particular function and it accepts a default file format known as MAF (Mutation Annotation Format). It was first developed for The Cancer Genome Atlas project (Cancer Genome Atlas Research Network, 2008). For example as illustrated by Z.L.Skidmore et al. (2016), to create Fig. 2 the following script was written in an standard MAF file containing variant mutation data and choosing which genes to plot [2] :

genes ¼ c(“PIK3CA”, “TP53”, “USH2”, “MLL3”, “BRCA1”)

GENVISR::WATERFALL(X ¼ MAF_FILE, PLOTGENES¼GENES)

References:

  1. Kodama,Y. et al. (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res., 40, D54–D56.
  2. Zachary L. Skidmore1 , Alex H. Wagner1 , Robert Lesurf1 , Katie M. Campbell1 , Jason Kunisaki1 , Obi L. Griffith1,2,3,4,* and Malachi Griffith. Bioinformatics, 2016, 1–3 doi: 10.1093/bioinformatics/btw325.
  3. Beroukhim,R. et al. (2010) The landscape of somatic copy-number alteration across human cancers. Nature, 463, 899–905.
How to cite this article: Faiza, M., 2016. GenVisR : A tool for genomic visualization. Bioinformatics Review, 2(7):page 9-13. The article is available at http://bioinformaticsreview.com/20160712/genvisr-a-tool-for-genomic-visualization/
Download PDF

Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply