Genozip- a new compression tool for VCF files

1 min read
vcf compression tool

Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip [1].

genozip tool allows compression of VCF files without any loss. This tool utilizes a compression algorithm specific to genotypes that are only one data type represented in VCF files.

Features of genozip:

  • capable of storing data of any phasing structure, ploidy, and variant types with up to 99 alternate alleles per variant.
  • allows pipeline analyses along with lossless compression.
  • allows secure storage and distribution.
  • can be easily operated on major operating systems (Linux, Windows, and MAC).
  • allows seamless integration into analytical pipelines.
  • data can be encrypted with a password.
  • compression can be optimized according to the users’ needs.
  • consists of several other options.

genozip has been tested on a benchmark dataset that shows faster and higher compression ratios than the other tested tools [1]. For more details about this tool, click here.


  1. Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).
Tariq is founder of Bioinformatics Review and a professional Software Developer at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.

Leave a Reply

md simulation of protein-ligand complex
Previous Story

Tutorial: MD Simulation of a Protein-Ligand Complex using GROMACS

python scripts
Next Story

Prepare receptor and ligand files for docking using Python scripts

Latest from Genomics

0 $0.00