How to obtain SMILES of ligands using PDB ligand IDs?

Dr. Muniba Faiza
2 Min Read

Fetching SMILE strings for a given number of SDF files of chemical compounds is not such a trivial task. We can quickly obtain them using RDKit or OpenBabel. But what if you don’t have SDF files of ligands in the first place? All you have is Ligand IDs from PDB. If they are a few then you can think of downloading SDF files manually but still, it seems time-consuming, especially when you have multiple compounds to work with. Therefore, we provide a Python script that will read all Ligand IDs and fetch their SDF files, and will finally convert them into SMILE strings.

pdb_ligand_id-to-smi.ipynb is a Python notebook that will fetch SMILES for each ligand ID using RDKit [1] provided in a CSV file.

Requirements

This script requires Python3 and uses RDKit along with some additional packages. Install them using the following commands.

$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
$ conda install pandas

Usage

Provide all Ligand IDs in the ‘lig-ids.csv‘ file and save it. Run the Jupyter notebook to get the results. The script will read ligand IDs, then download their respective SDF files, followed by combining them into a single SDF file. Finally, it will fetch SMILES from RDKit and write results in the ‘smiles.txt‘ file.

Availability

The script is available on GitHub in the ‘cheminformatics‘ repository.


References

  1. Landrum, G. (2013). Rdkit documentation. Release1 (1-79), 4.
Share This Article
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply