Fetching SMILE strings for a given number of SDF files of chemical compounds is not such a trivial task. We can quickly obtain them using RDKit or OpenBabel. But what if you don’t have SDF files of ligands in the first place? All you have is Ligand IDs from PDB. If they are a few then you can think of downloading SDF files manually but still, it seems time-consuming, especially when you have multiple compounds to work with. Therefore, we provide a Python script that will read all Ligand IDs and fetch their SDF files, and will finally convert them into SMILE strings.
pdb_ligand_id-to-smi.ipynb is a Python notebook that will fetch SMILES for each ligand ID using RDKit [1] provided in a CSV file.
Requirements
This script requires Python3 and uses RDKit along with some additional packages. Install them using the following commands.
$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
$ conda install pandas
Usage
Provide all Ligand IDs in the ‘lig-ids.csv‘ file and save it. Run the Jupyter notebook to get the results. The script will read ligand IDs, then download their respective SDF files, followed by combining them into a single SDF file. Finally, it will fetch SMILES from RDKit and write results in the ‘smiles.txt‘ file.
Availability
The script is available on GitHub in the ‘cheminformatics‘ repository.
References
- Landrum, G. (2013). Rdkit documentation. Release, 1 (1-79), 4.