tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Dr. Muniba Faiza
2 Min Read

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit.

Introduction

tanimoto_similarities.py script calculates Tanimoto similarities of given molecules in the form of smiles.

Let’s say we have a list of smiles of 15 molecules in a CSV file named ‘smiles.csv’. This file may also consist of other information such as ligand name, serial number, and so on. In that case, you can extract the smiles column from the CSV file. The smiles are available under the column named “SMILES” (or edit the column name in the script as per your file).

This script will calculate similarities and save them in the form of text files and heatmaps. Generated heatmaps will help you visualize the matrix. Sample smiles are provided in the ‘smiles.csv’ file.

Availability

The script is available on GitHub under the package ‘tanimoto_similarities‘.

Requirements

This script requires Python3 and uses RDKit along with some additional packages. Install them using the following commands.

$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
$ pip3 install seaborn
$ sudo apt-get install python3-matplotlib
$ conda install pandas
$ pip3 install numpy

Usage

This script consists of two functions. One function calculates the similarity matrix and shows the usual heatmap and saves the output file as ‘similarities.txt‘. The other function calculates the similarity matrix as a lower triangular matrix and saves the output file as ‘similarities_lower_tri.txt‘.
Run the script as:

$ python3 tanimoto_similarities.py

Note: If you still get an error stating “rdkit not found”, then perhaps you have not activated the rdkit environment. Run the conda activate my-rdkit-env command again and then run the script.


References

  1. Landrum, G. (2013). Rdkit documentation. Release1 (1-79), 4.
Share This Article
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
Leave a Comment

Leave a Reply