Cheminformatics
Installing CDK (Chemistry Development Kit) on Ubuntu (Linux)

CDK stands for chemistry development kit [1]. This is an open source kit for cheminformatics consisting of modular JAVA libraries. In this article, we will install CDK on Ubuntu.
Preparing system
Open a terminal by pressing Ctrl+Alt+T. Update and upgrade your system using the following commands:
$ sudo apt-get update
$ sudo apt-get upgrade
Installing prerequisites
You need to install the following packages to run CDK on your system. Paste the following command in the terminal.
$ sudo apt-get install -y maven
$ sudo apt-get install -y default-jdk
Downloading CDK
At first, change to the directory where you want to download the software. Let’s say ‘Downloads‘.
$ cd Downloads/
You can either manually download the latest version of the CDK from here or use the following command.
$ wget https://github.com/cdk/cdk/archive/refs/tags/cdk-2.5.tar.gz
It will take a moment to finish downloading.
Installing CDK
Extract the downloaded package.
$ tar xvzf cdk-2.5.tar.gz
A new directory would have been created namely, cdk-cdk-2.5. Change to this new directory to install.
$ cd cdk-cdk-2.5/
Now install using the following command.
$ mvn install
It will be finished in a few moments.
References
- Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., & Willighagen, E. (2003). The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics. Journal of chemical information and computer sciences, 43(2), 493-500.
Bioinformatics Programming
How to obtain ligand structures in PDB format from PDB ligand IDs?

Previously, we provided a similar script to download ligand SMILES from PDB ligand IDs. In this article, we are downloading PDB ligand structures from their corresponding IDs. (more…)
Cheminformatics
cheML.io: ML-generated molecules database

Due to the advancement of machine learning (ML) methods, we can find increasing applications of them in the field of bioinformatics as well. ML is being utilized in making personalized medicines, similarity searches in DNA and protein sequences, phylogenetics by mapping selected species on phylogenetic trees, gene and protein function annotation, generating chemical compounds, and so on. In this article, we will discuss an online database of ML-generated molecules known as cheML.io [1].
Bioinformatics Programming
How to obtain SMILES of ligands using PDB ligand IDs?

Fetching SMILE strings for a given number of SDF files of chemical compounds is not such a trivial task. We can quickly obtain them using RDKit or OpenBabel. But what if you don’t have SDF files of ligands in the first place? All you have is Ligand IDs from PDB. If they are a few then you can think of downloading SDF files manually but still, it seems time-consuming, especially when you have multiple compounds to work with. Therefore, we provide a Python script that will read all Ligand IDs and fetch their SDF files, and will finally convert them into SMILE strings. (more…)
Cheminformatics
Converting file formats using Openbabel.

Openbabel [1] offers a wide range of operations. One of which is file format conversion which is most widely used. In this article, we will describe commands that convert file formats. (more…)
Bioinformatics Programming
smitostr.py: Python script to convert SMILES to structures.

As mentioned in some of our previous articles, RDKit provides a wide range of functions. In this article, we are using RDKit [1] to draw a molecular structure using SMILES. (more…)
Bioinformatics Programming
tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit. (more…)
Cheminformatics
How to do molecular orbital analysis to find d-orbitals involved in bonding in an organometallic compound?

Structure modeling of chemical compounds finds essential application in the field of cheminformatics. It is used to study the structural stability, metal-ion bonding, the presence of electrons, closed and open shell energies, the reactivity of complexes, molecular orbital analyzes, molecular mechanics, and so on. There is some software available for structural modeling of chemical compounds/complexes and the most widely used are Gaussian [1] and ORCA [2]. (more…)
You must be logged in to post a comment Login