Bismark is bioinformatics to map bisulfite treated sequencing reads and to perform methylation calls [1]. In this article, we are going to extract methylation information from Bismark alignment outputs.
1. Preparing genome (Indexing)
Let’s first prepare our genome as it should be bisulfite converted to proceed further. This step needs to be done only once.
$ bismark_genome_preparation [options] /path/to/genome/folder
Let’s assume your genome is present in Documents/genomes/homo_sapiens/.
If Bowtie2 is not in your path, then use the following command for genome preparation:
$ bismark_genome_preparation --path_to_aligner /usr/bin/bowtie2/ --verbose /Documents/genomes/homo_sapiens/GRCh38/
2. Alignment
In this step, we perform the actual bisulfite alignment. Here, you have to specify the directory containing the genome of interest in FASTA format and a single or multiple sequence files to be analyzed. These files must be in FastA or FastQ format.
$ bismark [options] --genome <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
For example, let’s say your sequence file is test_data.fastq and is present in the same directory, then the command will be as shown below:
$ bismark --genome /Documents/genomes/homo_sapiens/GRCh37/ test_data.fastq
This will produce two files as output:
- test_data_bismark_bt2.bam – This file will contain all alignments and methylation call strings.
- test_data_bismark_SE_report.txt – This file will contain alignment and methylation summary.
3. Deduplication
This step is performed to deduplicate the Bismark alignment BAM file. It will remove all reads except the one aligned to the very same position.
$ deduplicate_bismark --bam [options] <filenames>
4. Methylation Extraction
This command will extract the context-dependent (CpG/CHG/CHH) methylation. Here, we will use the output file generated in step 2.
$ bismark_methylation_extractor [options] <filenames>
For example,
$ bismark_methylation_extractor --gzip --bedGraph test_data_bismark_bt2.bam
This command will generate three main output files and two other files:
- CpG_context_test_dataset_bismark_bt2.txt.gz
- CHG_context_test_dataset_bismark_bt2.txt.gz
- CHH_context_test_dataset_bismark_bt2.txt.gz
- A bedgraph, and
- Bismark coverage file
The output files will show:
- seq-ID
- methylation state
- chromosome
- start position (end position)
- methylation call
For more information on output, click here.
Reference
- Krueger, F., & Andrews, S. R. (2011). Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. bioinformatics, 27(11), 1571-1572.