How to Compress and Decompress FASTQ, SAM/BAM & VCF Files using genozip?

Dr. Muniba Faiza
2 Min Read

genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files [1]. In this article, we explain the usage of the genozip tool for the compression and decompression of these files.

To create a reference file

genozip can compress with or without a reference file but it is better to use a reference file to get much better results.

$ genozip --make-reference input.fa

It will output input.ref.genozip.

To compress FASTQ file using a reference file

For example, you have three FASTQ files: file1.fq, file2.fq, and file3.fq, then compress them using the reference file as shown below:

$ genozip --reference input.ref.genozip file1.fq file2.fq file3.fq

To compress VCF file using a reference file

$ genozip --reference input.ref.genozip files.vcf.gz

To compress SAM/BAM file using a reference file

$ genozip --reference input.ref.genozip file.bam

To compress paired ends

$ genozip --reference input.ref.genozip --pair sample1.fastq.gz sample2.fastq.gz

To decompress paired ends

$ genounzip --reference input.ref.genozip --unbind sample1+2.fastq.genozip

To compress & test the compression

$ genounzip inputfile.vcf --test

To convert SAM/BAM files to FASTQ

You can also convert SAM/BAM files to FASTQ format using the following command:

$ genounzip inputfile.bam.genozip --fastq

For more options, type the following in your terminal:

$ genounzip --help


References

  1. Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).
Share This Article
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply