Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip [1].
genozip tool allows compression of VCF files without any loss. This tool utilizes a compression algorithm specific to genotypes that are only one data type represented in VCF files.
Features of genozip:
- capable of storing data of any phasing structure, ploidy, and variant types with up to 99 alternate alleles per variant.
- allows pipeline analyses along with lossless compression.
- allows secure storage and distribution.
- can be easily operated on major operating systems (Linux, Windows, and MAC).
- allows seamless integration into analytical pipelines.
- data can be encrypted with a password.
- compression can be optimized according to the users’ needs.
- consists of several other options.
genozip has been tested on a benchmark dataset that shows faster and higher compression ratios than the other tested tools [1]. For more details about this tool, click here.
References
- Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).