cis-regulatory elements are DNA sequence segments that regulate gene expression. cis-regulatory elements consist of some regions such as promoters, enhancers, and so on. These regions consist of specific sequence motifs.
A newly expanded suite is developed to model the motif occurrence combinatorics in DNA sequences [1]. This suite is known as MOCCA (Motif Occurrence Combinatorics Algorithms). It is based on support vector machines (SVM) and random forest (RF) models, SVM-MOCCA and RF-MOCCA respectively.
Users enter motifs, training sequences, and model specifications. The sequences are provided in FATSA format or can be generated by either an i.i.d. model or an N-th order Markov chain. Users can enter IUPAC nucleotide code motifs and Position Weight Matrix (PWM) motifs.
MOCCA implements three types of models:
- CPREdictor
- Dummy PREdictor
- SVM-MOCCA
- including a new hierarchical model RF-MOCCA.
MOCCA is implemented in C++ with a minimal number of dependencies to ease the process of installation. It can be installed on Unix-based systems.
MOCCA is freely available to download from Github.
For more details, read here.
References
- Bredesen, B.A., Rehmsmeier, M. (2021). MOCCA: a flexible suite for modelling DNA sequence motif occurrence combinatorics. BMC Bioinformatics 22, 234.