Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
CheminformaticsDatabaseMachine Learning

cheML.io: ML-generated molecules database

Dr. Muniba Faiza
Last updated: January 26, 2023 2:32 pm
Dr. Muniba Faiza
Share
3 Min Read
cheML.io: ML-generated database of molecules
SHARE

Due to the advancement of machine learning (ML) methods, we can find increasing applications of them in the field of bioinformatics as well. ML is being utilized in making personalized medicines, similarity searches in DNA and protein sequences, phylogenetics by mapping selected species on phylogenetic trees, gene and protein function annotation, generating chemical compounds, and so on. In this article, we will discuss an online database of ML-generated molecules known as cheML.io [1].

Contents
  • How does cheML.io work?
  • How new molecules are added to the database?
    • References

cheML.io is a complete database of ML-generated molecules along with their calculated chemical properties. These molecules are generated from 10 different ML frameworks including CDN, ORGANIC, ChemVAE, CVAE, grammarVAE, JT-VAE, SSVAE, ORGAN, and MolCycleGAN. The training molecules are collected from ZINC and Chembl databases.

How does cheML.io work?

Here is the step-wise breakdown of the workflow used to generate the cheML.io database:

  • At first, molecules are generated using 10 ML methods summing up to 2.9 million molecules.
  • All molecules are tested using RDKit and discarded all invalid molecules (174, 000).
  • The remaining molecules are converted into canonical SMILES format using RDKit.
  • SMILES and calculated chemical properties are inserted into a relational database (PostgreSQL).

How new molecules are added to the database?

  • Search molecules that are similar to the query molecule in ZINC, CHEMBL, and CheML databases.
  • These molecules are used as a training dataset for the first training.
  • Search molecules based on substructure search in the above databases.
  • These molecules are used as a training dataset for the second training.
  • A combination of the above-searched molecules is used as a training dataset for the third training.
  • Generate molecules.
  • Filter out the molecules that are already present in cheML and add novel molecules to the database.

The web interface is user-friendly that allows users to retrieve data easily. It allows similarity search and substructure search by providing minimum and maximum values not only for the similarity but also for the other parameters. Additionally, users can draw molecules using the doodle widget. The database is freely downloadable and is available at http://cheml.io/.

For further reading, click here.


References

  1. Zhumagambetov, R., Kazbek, D., Shakipov, M., Maksut, D., Peshkov, V.A., & Fazli, S. (2020). cheML. io: an online database of ML-generated molecules. RSC advances , 10 (73), 45189-45198.
TAGGED:chemical librarycheminformaticscheML.ioMachine LearningMLML generated molecules
Share This Article
Facebook Copy Link Print
ByDr. Muniba Faiza
Follow:
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply Cancel reply

You must be logged in to post a comment.

How to visualize a 3D structure using Py3Dmol?
How to visualize a 3D structure using Py3Dmol?
Bioinformatics Programming Cheminformatics GitHub Python
ai tools vs traditional tools in bioinformatics
AI Tools vs Traditional Tools in Bioinformatics- Which one to select?
Algorithms Artificial Intelligence Machine Learning Software Tools
AI vs Physics in Molecular Docking
AI vs Physics in Molecular Docking: Towards Faster and More Accurate Pose Prediction
Artificial Intelligence Drug Discovery Machine Learning
10 years of Bioinformatics Review: From a Blog to a Bioinformatics Knowledge Hub!
Editorial

You Might Also Like

h2v
Database

H2V- A Database of Human Responsive Genes & Proteins for SARS & MERS

January 11, 2021
MitoTox- A new mitochondrial toxicity database
Database

MitoTox- A new mitochondrial toxicity database

July 25, 2021
tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.
Bioinformatics ProgrammingCheminformaticsPython

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

July 2, 2022
smitostr.py: Python script to convert SMILES to structures.
Bioinformatics ProgrammingCheminformaticsPython

smitostr.py: Python script to convert SMILES to structures.

November 19, 2022
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up