genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files [1]. In this article, we explain the usage of the genozip tool for the compression and decompression of these files.
To create a reference file
genozip can compress with or without a reference file but it is better to use a reference file to get much better results.
$ genozip --make-reference input.fa
It will output input.ref.genozip.
To compress FASTQ file using a reference file
For example, you have three FASTQ files: file1.fq, file2.fq, and file3.fq, then compress them using the reference file as shown below:
$ genozip --reference input.ref.genozip file1.fq file2.fq file3.fq
To compress VCF file using a reference file
$ genozip --reference input.ref.genozip files.vcf.gz
To compress SAM/BAM file using a reference file
$ genozip --reference input.ref.genozip file.bam
To compress paired ends
$ genozip --reference input.ref.genozip --pair sample1.fastq.gz sample2.fastq.gz
To decompress paired ends
$ genounzip --reference input.ref.genozip --unbind sample1+2.fastq.genozip
To compress & test the compression
$ genounzip inputfile.vcf --test
To convert SAM/BAM files to FASTQ
You can also convert SAM/BAM files to FASTQ format using the following command:
$ genounzip inputfile.bam.genozip --fastq
For more options, type the following in your terminal:
$ genounzip --help
References
- Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).