tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

RDKit [1] is a very nice cheminformatics software. It allows us to perform a wide range of operations on chemical compounds/ ligands. We have provided a Python script to perform fingerprinting using Tanimoto similarity on multiple compounds using RDKit.

Contents

Introduction
Availability
Requirements
Usage

References

Introduction

tanimoto_similarities.py script calculates Tanimoto similarities of given molecules in the form of smiles.

Let’s say we have a list of smiles of 15 molecules in a CSV file named ‘smiles.csv’. This file may also consist of other information such as ligand name, serial number, and so on. In that case, you can extract the smiles column from the CSV file. The smiles are available under the column named “SMILES” (or edit the column name in the script as per your file).

This script will calculate similarities and save them in the form of text files and heatmaps. Generated heatmaps will help you visualize the matrix. Sample smiles are provided in the ‘smiles.csv’ file.

Availability

The script is available on GitHub under the package ‘tanimoto_similarities‘.

Requirements

This script requires Python3 and uses RDKit along with some additional packages. Install them using the following commands.

$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
$ pip3 install seaborn
$ sudo apt-get install python3-matplotlib
$ conda install pandas
$ pip3 install numpy

Usage

This script consists of two functions. One function calculates the similarity matrix and shows the usual heatmap and saves the output file as ‘similarities.txt‘. The other function calculates the similarity matrix as a lower triangular matrix and saves the output file as ‘similarities_lower_tri.txt‘.

Run the script as:

$ python3 tanimoto_similarities.py

Note: If you still get an error stating “rdkit not found”, then perhaps you have not activated the rdkit environment. Run the conda activate my-rdkit-env command again and then run the script.

References

Landrum, G. (2013). Rdkit documentation. Release, 1 (1-79), 4.

tanimoto_similarities.py: A Python script to calculate Tanimoto similarities of multiple compounds using RDKit.

Introduction

Availability

Requirements

Usage

References

Leave a Reply Cancel reply

AI Tools vs Traditional Tools in Bioinformatics- Which one to select?

AI vs Physics in Molecular Docking: Towards Faster and More Accurate Pose Prediction

10 years of Bioinformatics Review: From a Blog to a Bioinformatics Knowledge Hub!

Starting in Bioinformatics? Do This First!

Introduction

Availability

Requirements

Usage

References

Leave a Reply Cancel reply

You Might Also Like