Bioinformatics ReviewBioinformatics Review
Notification Show More
Font ResizerAa
  •  Home
  • Docking
  • MD Simulation
  • Tools
  • More Topics
    • Softwares
    • Sequence Analysis
    • Algorithms
    • Bioinformatics Programming
    • Bioinformatics Research Updates
    • Drug Discovery
    • Phylogenetics
    • Structural Bioinformatics
    • Editorials
    • Tips & Tricks
    • Bioinformatics News
    • Featured
    • Genomics
    • Bioinformatics Infographics
  • Community
    • BiR-Research Group
    • Community Q&A
    • Ask a question
    • Join Telegram Channel
    • Join Facebook Group
    • Join Reddit Group
    • Subscription Options
    • Become a Patron
    • Write for us
  • About Us
    • About BiR
    • BiR Scope
    • The Team
    • Guidelines for Research Collaboration
    • Feedback
    • Contact Us
    • Recent @ BiR
  • Subscription
  • Account
    • Visit Dashboard
    • Login
Font ResizerAa
Bioinformatics ReviewBioinformatics Review
Search
Have an existing account? Sign In
Follow US
CheminformaticsDatabaseMachine Learning

cheML.io: ML-generated molecules database

Dr. Muniba Faiza
Last updated: January 26, 2023 2:32 pm
Dr. Muniba Faiza
Share
3 Min Read
cheML.io: ML-generated database of molecules
SHARE

Due to the advancement of machine learning (ML) methods, we can find increasing applications of them in the field of bioinformatics as well. ML is being utilized in making personalized medicines, similarity searches in DNA and protein sequences, phylogenetics by mapping selected species on phylogenetic trees, gene and protein function annotation, generating chemical compounds, and so on. In this article, we will discuss an online database of ML-generated molecules known as cheML.io [1].

Contents
  • How does cheML.io work?
  • How new molecules are added to the database?
    • References

cheML.io is a complete database of ML-generated molecules along with their calculated chemical properties. These molecules are generated from 10 different ML frameworks including CDN, ORGANIC, ChemVAE, CVAE, grammarVAE, JT-VAE, SSVAE, ORGAN, and MolCycleGAN. The training molecules are collected from ZINC and Chembl databases.

How does cheML.io work?

Here is the step-wise breakdown of the workflow used to generate the cheML.io database:

  • At first, molecules are generated using 10 ML methods summing up to 2.9 million molecules.
  • All molecules are tested using RDKit and discarded all invalid molecules (174, 000).
  • The remaining molecules are converted into canonical SMILES format using RDKit.
  • SMILES and calculated chemical properties are inserted into a relational database (PostgreSQL).

How new molecules are added to the database?

  • Search molecules that are similar to the query molecule in ZINC, CHEMBL, and CheML databases.
  • These molecules are used as a training dataset for the first training.
  • Search molecules based on substructure search in the above databases.
  • These molecules are used as a training dataset for the second training.
  • A combination of the above-searched molecules is used as a training dataset for the third training.
  • Generate molecules.
  • Filter out the molecules that are already present in cheML and add novel molecules to the database.

The web interface is user-friendly that allows users to retrieve data easily. It allows similarity search and substructure search by providing minimum and maximum values not only for the similarity but also for the other parameters. Additionally, users can draw molecules using the doodle widget. The database is freely downloadable and is available at http://cheml.io/.

For further reading, click here.


References

  1. Zhumagambetov, R., Kazbek, D., Shakipov, M., Maksut, D., Peshkov, V.A., & Fazli, S. (2020). cheML. io: an online database of ML-generated molecules. RSC advances , 10 (73), 45189-45198.
TAGGED:chemical librarycheminformaticscheML.ioMachine LearningMLML generated molecules
Share This Article
Facebook Copy Link Print
ByDr. Muniba Faiza
Follow:
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply Cancel reply

You must be logged in to post a comment.

ai tools vs traditional tools in bioinformatics
AI Tools vs Traditional Tools in Bioinformatics- Which one to select?
Algorithms Artificial Intelligence Machine Learning Software Tools
AI vs Physics in Molecular Docking
AI vs Physics in Molecular Docking: Towards Faster and More Accurate Pose Prediction
Artificial Intelligence Drug Discovery Machine Learning
10 years of Bioinformatics Review: From a Blog to a Bioinformatics Knowledge Hub!
Editorial
Starting in Bioinformatics? Do This First!
Starting in Bioinformatics? Do This First!
Tips & Tricks

You Might Also Like

How to obtain ligand structures in PDB format from PDB ligand IDs?
Bioinformatics ProgrammingCheminformaticsPython

How to obtain ligand structures in PDB format from PDB ligand IDs?

February 18, 2023
Download ZINC database
DatabaseDockingDrug DiscoveryVirtual Screening

How to download small molecules from ZINC database for virtual screening?

July 8, 2022
MitoTox- A new mitochondrial toxicity database
Database

MitoTox- A new mitochondrial toxicity database

July 25, 2021
TANTIGEN 2.0- A Database of Tumor T-cell Antigens & Epitopes
Database

TANTIGEN 2.0- A Database of Tumor T-cell Antigens & Epitopes

April 28, 2021
Copyright 2024 IQL Technologies
  • Journal
  • Customer Support
  • Contact Us
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Sitemap
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up