vs_Analysis.py: A Python Script to Analyze Virtual Screening Results of Autodock Vina

Dr. Muniba Faiza
3 Min Read

The output files obtained as a result of virtual screening (VS) using Autodock Vina may be large in number. It is difficult or quite impossible to analyze them manually. Therefore, we are providing a Python script to fetch top results (i.e., compounds showing low binding affinities).

We can screen multiple compounds using Autodock Vina to find some potential candidates for drug development. As a result, multiple output files are generated. Later, we analyze these files to select compounds showing the lowest binding affinities. It is a tedious task to analyze them manually one by one. Further, there are some plugins such as Raccoon2 to serve the purpose. However, it is sometimes difficult to install and execute these plugins. Therefore, we have developed a Python script to do so.

Usage:

Download the file and save it in the directory where you have kept all log files. If you will save it elsewhere, don’t forget to provide the full path to this file. It will run on Linux as well as on Windows.

$ python vs_analysis.py

After running the above command, it will prompt you to enter a valid number that will represent the number of poses/results you want to select. A new file will be created namely, ‘output.txt’, in the same directory. This file will contain the final results.

Example:

You have 50 log files in your directory and you want to fetch the top 20 results/poses sorted with the lowest binding affinities. Then run the above command and while prompted enter 20. It will provide the top 20 results in the ‘output.txt’ file. Remember to enter a valid number, i.e., the number you enter must be less than or equal to the number of files present in the directory.

Availability

The script is free to download from the Bitbucket account of Bioinformatics Review and my personal Github account- Muniba Faiza.

NOTE:

  • This script screens for the log files containing the word ‘log’ in their filenames, e.g., “log_compound1.txt” or “log_methanol.txt”
  • It is recommended to name your log files along with the compound’s name. That would make the results more presentable and easy to understand. For example, you can name them as “log_naphthalene.txt” or so.

How to cite this script:

Faiza M., (2021). vs_Analysis.py: A Python Script to Analyze Virtual Screening Results of Autodock Vina 8(5):page 12-16. The article is available at https://bioinformaticsreview.com/20210509/vs-analysis-a-python-script-to-analyze-virtual-screening-results-of-autodock-vina/
Faiza, M., (2024). VS_Analysis: A Python package to perform post-virtual screening analysis, 10(1): page 8-12. https://bioinformaticsreview.com/20240110/vs_analysis-a-python-package-to-perform-post-virtual-screening-analysis/

For any queries, contact me at muniba@bioinformaticsreview.com.


 

Further Reading

Virtual Screening Methodology for Structure-based Drug Designing

How to perform virtual screening using Autodock Vina?

Video Tutorial: Virtual Screening using Autodock Vina

How to download small molecules from ZINC database for virtual screening?

Share This Article
Dr. Muniba is a Bioinformatician based in New Delhi, India. She has completed her PhD in Bioinformatics from South China University of Technology, Guangzhou, China. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba
3 Comments
  • Hi Muniba,

    Could you please help me in editing the script and make it usable for log file containing the binding affinities of several molecules in one single file?

    I am using Smina, which is a fork of Vina, and in it instead of using each ligand pdbqt file as a separate input file, I am using single input file containing several ligands altogether in it. Hence, the log file generated is also only one.

    Could you please edit the python script accordingly and make it usable for my case?

    It would be a great help.

    Thanking you in anticipation.

    Regards,
    Faraz

  • It is having all the values for each file in a repetitive manner one after the other (as shown below as an example).

    Using random seed: 1041151582

    mode | affinity | dist from best mode
    | (kcal/mol) | rmsd l.b.| rmsd u.b.
    —–+————+———-+———-
    1 -8.6 0.000 0.000
    2 -8.1 3.817 6.149
    3 -8.0 2.208 3.062
    4 -8.0 1.472 1.762
    5 -7.8 1.736 2.232
    6 -7.7 3.860 6.192
    7 -7.7 3.854 6.141
    8 -7.6 3.751 6.145
    9 -7.6 3.200 5.821
    Using random seed: 1041151582

    mode | affinity | dist from best mode
    | (kcal/mol) | rmsd l.b.| rmsd u.b.
    —–+————+———-+———-
    1 -8.3 0.000 0.000
    2 -8.3 0.018 2.670
    3 -8.0 1.686 2.186
    4 -8.0 1.692 2.659
    5 -7.5 2.726 3.606
    6 -7.4 2.728 3.857
    7 -7.3 3.132 6.883
    8 -7.1 2.692 3.585
    9 -7.0 2.716 3.453
    Using random seed: 1041151582

    mode | affinity | dist from best mode
    | (kcal/mol) | rmsd l.b.| rmsd u.b.
    —–+————+———-+———-
    1 -8.0 0.000 0.000
    2 -8.0 1.279 2.121
    3 -7.7 1.684 2.132
    4 -7.6 2.868 3.625
    5 -7.5 3.130 6.368
    6 -7.4 2.482 3.321
    7 -7.3 2.396 5.734
    8 -7.2 2.689 3.690
    9 -6.9 3.053 6.551
    Using random seed: 1041151582

Leave a Reply