Connect with us

Bioinformatics Programming

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Published

on

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit.

Introduction

tanimoto_similarities.py script calculates Tanimoto similarities of given molecules in the form of smiles.

Let’s say we have a list of smiles of 15 molecules in a CSV file named ‘smiles.csv’. This file may also consist of other information such as ligand name, serial number, and so on. In that case, you can extract the smiles column from the CSV file. The smiles are available under the column named “SMILES” (or edit the column name in the script as per your file).

This script will calculate similarities and save them in the form of text files and heatmaps. Generated heatmaps will help you visualize the matrix. Sample smiles are provided in the ‘smiles.csv’ file.

Availability

The script is available on GitHub under the package ‘tanimoto_similarities‘.

Requirements

This script requires Python3 and uses RDKit along with some additional packages. Install them using the following commands.

$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
$ pip3 install seaborn
$ sudo apt-get install python3-matplotlib
$ conda install pandas
$ pip3 install numpy

Usage

This script consists of two functions. One function calculates the similarity matrix and shows the usual heatmap and saves the output file as ‘similarities.txt‘. The other function calculates the similarity matrix as a lower triangular matrix and saves the output file as ‘similarities_lower_tri.txt‘.
Run the script as:

$ python3 tanimoto_similarities.py

Note: If you still get an error stating “rdkit not found”, then perhaps you have not activated the rdkit environment. Run the conda activate my-rdkit-env command again and then run the script.


References

  1. Landrum, G. (2013). Rdkit documentation. Release1 (1-79), 4.

Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Bioinformatics Programming

vs_analysis_compound.py: Python script to search for binding affinities based on compound names.

Published

on

vs_analysis_compound.py: Python script to search for binding affinities based on compound names.

Previously, we have provided the vs_analysis.py script to analyze virtual screening (VS) results obtained from Autodock Vina. In this article, we have provided another script to search for binding affinity associated with a compound. (more…)

Continue Reading

Bioinformatics Programming

How to download files from an FTP server using Python?

Published

on

How to download files from an FTP server using Python?

In this article, we provide a simple Python script to download files from an FTP server using Python. (more…)

Continue Reading

Bioinformatics Programming

How to convert the PDB file to PSF format?

Published

on

How to convert the PDB file to PSF format?

VMD allows converting PDB to PSF format but sometimes it gives multiple errors. Therefore, in this article, we are going to convert PDB into PSF format using a different method. (more…)

Continue Reading

LATEST ISSUE

ADVERT