Protein sequence analyses include protein similarity, Protein function prediction, protein interactions, and so on. A new feature extraction model is developed for easy analysis of protein sequences.
This extraction model is known as FEGS (Feature Extraction based on Graphical and Statistical features) . It represents protein sequences graphically based on their physicochemical properties and statistical features. By using these two properties/features, FEGS transforms a protein sequence into a 578-dimensional numerical vector.
How does FEGS work?
After taking protein sequences as input, FEGS starts building 158 space curves for the next protein sequence. After that, it builds L/L matrices and calculates normalized maximum eigenvalues. In the third step, it calculates the frequency of 20 amino acids and 400 dipeptides present in the sequence. It ultimately provides a frequency vector of that protein sequence. In the fourth step, it develops a feature vector of the sequence. It can later be subjected to phylogenetic analysis.
FEGS is a user-friendly software and freely downloadable on Sourceforge at https://sourceforge.net/projects/transcriptomeassembly/files/Feature%20Extraction/. FEGS’s performance has been tested on five different protein sequence datasets and it has shown the best performance amongst the other existing methods.
For more information, read here.
- Mu, Z., Yu, T., Liu, X. et al. (2021). FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinformatics 22, 297.
Installing RDPTools on Ubuntu (Linux)
RDP provides analysis tools called RDPTools. These tools are used to high-throughput sequencing data including single-strand, and paired-end reads . In this article, we are going to install RDPTools on Ubuntu (Linux). (more…)
NGlyAlign- A New Tool to Align Highly Variable Regions in HIV Sequences
It is necessary to detect highly variable regions in envelopes of viruses as it allows the establishment of the viruses in the human body. A new tool is developed to build and align the highly variable regions in HIV sequences. (more…)