NGS
How to extract methylation call using Bismark?

Bismark is bioinformatics to map bisulfite treated sequencing reads and to perform methylation calls [1]. In this article, we are going to extract methylation information from Bismark alignment outputs.
1. Preparing genome (Indexing)
Let’s first prepare our genome as it should be bisulfite converted to proceed further. This step needs to be done only once.
$ bismark_genome_preparation [options] /path/to/genome/folder
Let’s assume your genome is present in Documents/genomes/homo_sapiens/.
If Bowtie2 is not in your path, then use the following command for genome preparation:
$ bismark_genome_preparation --path_to_aligner /usr/bin/bowtie2/ --verbose /Documents/genomes/homo_sapiens/GRCh38/
2. Alignment
In this step, we perform the actual bisulfite alignment. Here, you have to specify the directory containing the genome of interest in FASTA format and a single or multiple sequence files to be analyzed. These files must be in FastA or FastQ format.
$ bismark [options] --genome <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
For example, let’s say your sequence file is test_data.fastq and is present in the same directory, then the command will be as shown below:
$ bismark --genome /Documents/genomes/homo_sapiens/GRCh37/ test_data.fastq
This will produce two files as output:
- test_data_bismark_bt2.bam – This file will contain all alignments and methylation call strings.
- test_data_bismark_SE_report.txt – This file will contain alignment and methylation summary.
3. Deduplication
This step is performed to deduplicate the Bismark alignment BAM file. It will remove all reads except the one aligned to the very same position.
$ deduplicate_bismark --bam [options] <filenames>
4. Methylation Extraction
This command will extract the context-dependent (CpG/CHG/CHH) methylation. Here, we will use the output file generated in step 2.
$ bismark_methylation_extractor [options] <filenames>
For example,
$ bismark_methylation_extractor --gzip --bedGraph test_data_bismark_bt2.bam
This command will generate three main output files and two other files:
- CpG_context_test_dataset_bismark_bt2.txt.gz
- CHG_context_test_dataset_bismark_bt2.txt.gz
- CHH_context_test_dataset_bismark_bt2.txt.gz
- A bedgraph, and
- Bismark coverage file
The output files will show:
- seq-ID
- methylation state
- chromosome
- start position (end position)
- methylation call
For more information on output, click here.
Reference
- Krueger, F., & Andrews, S. R. (2011). Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. bioinformatics, 27(11), 1571-1572.
NGS
[Tutorial] Trailing of paired end reads using Trimmomatic tool in GALAXY.

Trimmomatic is a read trimming tool for Illumina NGS data [1]. It is a flexible tool providing several functions to be operated on reads. These functions include trailing, leading, and several other quality control operations. In this article, we are going to perform trailing on NGS paired-end reads data using the GALAXY platform [2]. (more…)
NGS
Installing PANDAseq on Ubuntu

PANDAseq is a bioinformatics tool that aligns paired-ends of Illumina sequences [1]. In this article, we are going to install PANDAseq on Ubuntu. (more…)
NGS
Installing Bismark on Ubuntu

Bismark is a bioinformatics tool to map bisulfite treated sequencing reads to a genome [1]. It also determines cytosine methylation sites. In this article, we will install Bismark on Ubuntu. (more…)
NGS
FiNGS- A New Software providing Filters for Next Generation Sequencing

We use somatic variant callers to detect mutations in cancer samples by comparing sequencing data tumor and normal sample pairs. This is followed by some ad-hoc filtering that may produce low precision data resulting in a large number of false positives. (more…)
NGS
IonCRAM: New Tool for Ion Torrent Sequence Files Compression

One of the major next-generation sequencing (NGS) technologies that are most frequently used in medical research is Ion Torrent. Software for Ion Torrent machines provides output in BAM files that are huge in size. Additionally, their compression is also space expensive. (more…)
HTS
Assembly of high-throughput mRNA-Seq data: A review

Transcriptome represents the complete set of all expressed transcripts (RNA molecules) present in a cell or tissue at a given point of time. The transcriptome is always dynamic in nature and keeps on changing with time driven by the external and internal environment. (more…)
Meta Analysis
Predictive metagenomics profiling: why, what and how ?

What is predictive metagenomics profiling?
Recently, predictive metagenomics profiling (PMP) has been added to the microbial ecologist’s arsenal of strategies for probing microbial communities. (more…)
NGS
ALFALFA explained

High throughput sequencing has revolutionized the new world of bioinformatics research. Since everyone is aware of the Human Genome project in which the human genome has been sequenced, millions of species have been sequenced so far. Sequencing is a very important aspect of bioinformatics so new faster and better sequencing techniques are needed . New sequencing platforms produce biological sequence fragments faster and cheaper.
You must be logged in to post a comment Login