A new suite of tools called SequelTools has been developed for analyzing PacBio raw sequence data [1]. Pacbio is a third-generation DNA sequencing method that is capable of detecting methylated bases, gives very long reads, and provides real-time sequencing.
SequelTools is a command-line program that consists of tools for read subsampling, quality control, and read filtering [1]. According to the authors, there is no such tool available yet for analyzing the quality, subsampling, and filtering of PacBio raw sequence data.
SequelTools consists of three tools that can be used one at a time.
1. Quality Control (QC) tool
This tool processes SMRTcells and generates multiple statistics and plots. These plots describe the quality of input data.
2. Read subsampling tool
It subsamples reads based on different criteria selected by the user such as random CLR selection or longest subreads per CLR.
3. Read filtering tool
This tool allows users to normalize data by minimum length or by filtering out certain low-quality scrap reads.
These tools can be easily used with command-line arguments. The main script written in bash uses Samtools for file conversion BAM and SAM file formats.
SequelTools is developed in R, bash, and Python. It requires BAM format files as input. The suite is freely accessible, fast, and efficient, and is available at https://github.com/ISUgenomics/SequelTools. It can be run on any operating system. The performance of SequelTools was also tested on benchmark data. It revealed that SequelTools took around half an hour while processing scraps only and a little more than an hour with scraps and subreads [1].
For further details, read here.
References
- Hufnagel, D. E., Hufford, M. B., & Seetharam, A. S. (2020). SequelTools: a suite of tools for working with PacBio Sequel raw sequence data. BMC bioinformatics, 21(1), 1-11.