Genozip- a new compression tool for VCF files

Tariq Abdullah
2 Min Read

Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip [1].

genozip tool allows compression of VCF files without any loss. This tool utilizes a compression algorithm specific to genotypes that are only one data type represented in VCF files.

Features of genozip:

  • capable of storing data of any phasing structure, ploidy, and variant types with up to 99 alternate alleles per variant.
  • allows pipeline analyses along with lossless compression.
  • allows secure storage and distribution.
  • can be easily operated on major operating systems (Linux, Windows, and MAC).
  • allows seamless integration into analytical pipelines.
  • data can be encrypted with a password.
  • compression can be optimized according to the users’ needs.
  • consists of several other options.

genozip has been tested on a benchmark dataset that shows faster and higher compression ratios than the other tested tools [1]. For more details about this tool, click here.


References

  1. Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).
Share This Article
Tariq is founder of Bioinformatics Review and CEO at IQL Technologies. His areas of expertise include algorithm design, phylogenetics, MicroArray, Plant Systematics, and genome data analysis. If you have questions, reach out to him via his homepage.
Leave a Comment

Leave a Reply