Variant Call Format (VCF) is a text file format used to store thousands of genomic datasets. Since these files consist of a large number of gene sequences, their file size is quite large even after compression. Recently, a new compression tool has been introduced known as genozip .
genozip tool allows compression of VCF files without any loss. This tool utilizes a compression algorithm specific to genotypes that are only one data type represented in VCF files.
Features of genozip:
- capable of storing data of any phasing structure, ploidy, and variant types with up to 99 alternate alleles per variant.
- allows pipeline analyses along with lossless compression.
- allows secure storage and distribution.
- can be easily operated on major operating systems (Linux, Windows, and MAC).
- allows seamless integration into analytical pipelines.
- data can be encrypted with a password.
- compression can be optimized according to the users’ needs.
- consists of several other options.
genozip has been tested on a benchmark dataset that shows faster and higher compression ratios than the other tested tools . For more details about this tool, click here.
- Lan, D., Tobler, R., Souilmi, Y., & Llamas, B. (2020). genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford, England).
How to install Cortex on Ubuntu?
Cortex is a user-friendly framework for genome analysis . It acquires less memory and is quite efficient in performance. It’s installation involves various steps. In this article, we will install Cortex on Ubuntu. (more…)
How to Compress and Decompress FASTQ, SAM/BAM & VCF Files using genozip?
genozip is a tool for lossless compression of large files including VCF, FASTQ, and SAM/BAM files . In this article, we explain the usage of the genozip tool for the compression and decompression of these files. (more…)