It is a challenging task to discover somatic coding indels that are generated during the preparation of the PCR-based RNA-seq library. A new tool called RNAIndel [1] has been developed for this purpose.
RNAIndel predicts indels in RNA-seq data and classifies them as somatic, germline, and artifact indels. RNAIndel implements a biological effect in a machine learning framework and predicts somatic indels with around 88-100% accuracy. RNAIndel is composed of 31 features in total including a count of repeats, relative indel location, and so on.
RNAIndel takes the RNA-seq BAM file as input mapped by STAR software [2]. After reading the input file, all indels are annotated using RefSeq [3] isoforms followed by querying a custom germline database for exact and equivalent matches. Indels having greater than or equal to two reads are predicted and classifies as single-nucleotide (s-indel) and multi-nucleotide (m-indel). Finally, it generates output in the VCF file consisting of indel entries, supporting reads, predicted class, and probability.
For further details, read here.
References
- Hagiwara, K., Ding, L., Edmonson, M. N., Rice, S. V., Newman, S., Easton, J., … & Zhang, J. (2020). RNAIndel: discovering somatic coding indels from tumor RNA-Seq data. Bioinformatics, 36(5), 1382-1390.
- Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., … & Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21.
- O’Leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., McVeigh, R., … & Astashyn, A. (2016). Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic acids research, 44(D1), D733-D745.