Here are the latest research developments in the field of bioinformatics made in the month of March 2022.
1. A new method to compress long reads.
A new algorithm named ‘CoLoRd’ is developed to compress long reads efficiently [1]. This algorithm is capable of reducing the size of third-generation sequencing data without affecting the accuracy of downstream analysis. It works for Oxford Nanopore reads and PacBio sequencing data. As compared to gzip, CoLoRd provides up to ten-fold space reduction [1]. The code is freely accessible on GitHub.
For further information, read here.
2. A new drug-drug interactions predictor based on convolutional neural networks.
A new learning-based drug-drug interaction prediction method is developed named, CNN-DDI [2]. This method is based on convolutional neural networks. The method consists of 5 convolutional layers, 2 fully connected layers, and 1 softmax layer based on CNN. CNN-DDI has been found robust on three similarity methods including Jaccard similarity, cosine similarity, and Gaussian similarity.
The DDI multimodal deep learning framework is available on the GitHub repository.
For further information, read here.
3. A new protein 3D structure modeling method.
A-Prot is a new method to predict protein 3D structures [3]. This method uses a protein language model known as an MSA transformer. The MSA transformer used by the authors is a pre-trained version learned from 26 million MSAs. As compared to the existing methods, A-Prot is capable of predicting better long-range contacts [3]. The code is freely accessible on GitHub.
For further information, read here.
4. New abstraction for protein structures.
A new abstraction called hierarchical representation is introduced for protein-protein site predictions [4]. This representation quantifies the spatial and sequential neighboring amino acids. This method uses the Graph Convolutional Networks technique for the classification of interface and no-interface amino acids. It outperforms some state-of-the-art protein interface predictors given the molecules are structurally similar [4]. The code is freely available on GitLab.
For further information, read here.
5. A new method for protein secondary structure prediction.
ProteinUnet2 is a new lightweight deep network for protein secondary structure prediction [5]. It is based on Unet convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. It requires shorter training and interface times as compared to the other predictors [5]. The code is available at https://codeocean.com/capsule/0425426/tree/v3.
For further information, read here.
6. A new tool for peptide identification based on machine learning.
A new tool named TIDD is developed for confident peptide identification [6]. This tool provides confident peptides irrespective of the database search engine used. TIDD showed similar or better performance than Percolator [6]. It can work for any search engine including newly developed. The code is freely accessible on GitHub.
For further information, read here.
References
- Kokot, M., Gudyś, A., Li, H. et al. (2022). CoLoRd: compressing long reads. Nat Methods.
- Zhang, C., Lu, Y. & Zang, T. (2022). CNN-DDI: a learning-based method for predicting drug-drug interactions using convolution neural networks. BMC Bioinformatics 23, 88.
- Hong, Y., Lee, J. & Ko, J. (2022). A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 23, 93.
- Quadrini, M., Daberdaku, S. & Ferrari, C. (2022). Hierarchical representation for PPI sites prediction. BMC Bioinformatics 23, 96.
- Stapor, K., Kotowski, K., Smolarczyk, T. et al. (2022). Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation. BMC Bioinformatics 23, 100.
- Li, H., Na, S., Hwang, KB. et al. (2022). TIDD: tool-independent and data-dependent machine learning for peptide identification. BMC Bioinformatics 23, 109.