How to Compress and Decompress FASTQ, SAM/BAM & VCF Files using genozip?

compressing and decompressing files using genozip

genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files [1]. In this article, we explain the usage of the genozip tool for the compression and decompression of these files.

To create a reference file

genozip can compress with or without a reference file but it is better to use a reference file to get much better results.

$ genozip --make-reference input.fa

It will output input.ref.genozip.

To compress FASTQ file using a reference file

For example, you have three FASTQ files: file1.fq, file2.fq, and file3.fq, then compress them using the reference file as shown below:

$ genozip --reference input.ref.genozip file1.fq file2.fq file3.fq

To compress VCF file using a reference file

$ genozip --reference input.ref.genozip files.vcf.gz

To compress SAM/BAM file using a reference file

$ genozip --reference input.ref.genozip file.bam

To compress paired ends

$ genozip --reference input.ref.genozip --pair sample1.fastq.gz sample2.fastq.gz

To decompress paired ends

$ genounzip --reference input.ref.genozip --unbind sample1+2.fastq.genozip

To compress & test the compression

$ genounzip inputfile.vcf --test

To convert SAM/BAM files to FASTQ

You can also convert SAM/BAM files to FASTQ format using the following command:

$ genounzip inputfile.bam.genozip --fastq

For more options, type the following in your terminal:

$ genounzip --help


