Cheminformatics
Converting file formats using Openbabel.

Openbabel [1] offers a wide range of operations. One of which is file format conversion which is most widely used. In this article, we will describe commands that convert file formats.
Assuming, you have already installed Openbabel on your system, you should be able to run it as obabel/babel in the terminal. Also, you can use the GUI of Openbabel that you will have to compile during its installation.
So, here is the command syntax for file conversion:
$ obabel -i<input_format> <input_filename> -o<output_format> -O <output_filename> -other_arguments
It is optional to provide input and output formats but it is always better to do so.
Let’s convert opensmiles SMILES to canonical SMILES.
$ obabel -ismi input.smi -osmi -O output.smi -ocan
Similarly, you can convert SMILES to InChi as shown below:
$ obabel -ismi input.smi -oinchi -O output.inchi
If you want, you can give some additional arguments such as ignoring isomers, removing duplicates, etc. Sometimes, the openbabel stops processing once it finds any invalid compound/molecule, so for that, you can -e to the command.
$ obabel -ismi input.smi -oinchi -O output.inchi -e --unique
Here, –unique will not convert the duplicate molecules.
If you want to ignore the stereochemistry of the molecules then you can use the following arguments:
$ obabel -ismi input.smi -oinchi -O output.inchi -e /nostereo /nochg /noiso /noEZ /sp3
You can also keep unique molecules based on the arguments given in the above command by writing –unique before the arguments as shown below.
$ obabel -ismi input.smi -oinchi -O output.inchi -e --unique /nostereo /nochg /noiso /noEZ /sp3
For further information on supported file formats, read here.
References
- O’Boyle, NM, Banck, M., James, CA, Morley, C., Vandermeersch, T., & Hutchison, GR (2011). Open Babel: An open chemical toolbox. Journal of cheminformatics, 3 (1), 1-14.
Bioinformatics Programming
How to obtain ligand structures in PDB format from PDB ligand IDs?

Previously, we provided a similar script to download ligand SMILES from PDB ligand IDs. In this article, we are downloading PDB ligand structures from their corresponding IDs. (more…)
Cheminformatics
cheML.io: ML-generated molecules database

Due to the advancement of machine learning (ML) methods, we can find increasing applications of them in the field of bioinformatics as well. ML is being utilized in making personalized medicines, similarity searches in DNA and protein sequences, phylogenetics by mapping selected species on phylogenetic trees, gene and protein function annotation, generating chemical compounds, and so on. In this article, we will discuss an online database of ML-generated molecules known as cheML.io [1].
Bioinformatics Programming
How to obtain SMILES of ligands using PDB ligand IDs?

Fetching SMILE strings for a given number of SDF files of chemical compounds is not such a trivial task. We can quickly obtain them using RDKit or OpenBabel. But what if you don’t have SDF files of ligands in the first place? All you have is Ligand IDs from PDB. If they are a few then you can think of downloading SDF files manually but still, it seems time-consuming, especially when you have multiple compounds to work with. Therefore, we provide a Python script that will read all Ligand IDs and fetch their SDF files, and will finally convert them into SMILE strings. (more…)
You must be logged in to post a comment Login