The output files obtained as a result of virtual screening (VS) using Autodock Vina may be large in number. It is difficult or quite impossible to analyze them manually. Therefore, we are providing a Python script to fetch top results (i.e., compounds showing low binding affinities).
We can screen multiple compounds using Autodock Vina to find some potential candidates for drug development. As a result, multiple output files are generated. Later, we analyze these files to select compounds showing the lowest binding affinities. It is a tedious task to analyze them manually one by one. Further, there are some plugins such as Raccoon2 to serve the purpose. However, it is sometimes difficult to install and execute these plugins. Therefore, we have developed a Python script to do so.
Usage:
Download the file and save it in the directory where you have kept all log files. If you will save it elsewhere, don’t forget to provide the full path to this file. It will run on Linux as well as on Windows.
$ python vs_analysis.py
After running the above command, it will prompt you to enter a valid number that will represent the number of poses/results you want to select. A new file will be created namely, ‘output.txt’, in the same directory. This file will contain the final results.
Example:
You have 50 log files in your directory and you want to fetch the top 20 results/poses sorted with the lowest binding affinities. Then run the above command and while prompted enter 20. It will provide the top 20 results in the ‘output.txt’ file. Remember to enter a valid number, i.e., the number you enter must be less than or equal to the number of files present in the directory.
Availability
The script is free to download from the Bitbucket account of Bioinformatics Review and my personal Github account- Muniba Faiza.
NOTE:
- This script screens for the log files containing the word ‘log’ in their filenames, e.g., “log_compound1.txt” or “log_methanol.txt”
- It is recommended to name your log files along with the compound’s name. That would make the results more presentable and easy to understand. For example, you can name them as “log_naphthalene.txt” or so.
How to cite this script:
Faiza M., (2021). vs_Analysis.py: A Python Script to Analyze Virtual Screening Results of Autodock Vina 8(5):page 12-16. The article is available at https://bioinformaticsreview.com/20210509/vs-analysis-a-python-script-to-analyze-virtual-screening-results-of-autodock-vina/
Faiza, M., (2024). VS_Analysis: A Python package to perform post-virtual screening analysis, 10(1): page 8-12. https://bioinformaticsreview.com/20240110/vs_analysis-a-python-package-to-perform-post-virtual-screening-analysis/
For any queries, contact me at muniba@bioinformaticsreview.com.
Further Reading
Virtual Screening Methodology for Structure-based Drug Designing
How to download small molecules from ZINC database for virtual screening?
Hi Muniba,
Could you please help me in editing the script and make it usable for log file containing the binding affinities of several molecules in one single file?
I am using Smina, which is a fork of Vina, and in it instead of using each ligand pdbqt file as a separate input file, I am using single input file containing several ligands altogether in it. Hence, the log file generated is also only one.
Could you please edit the python script accordingly and make it usable for my case?
It would be a great help.
Thanking you in anticipation.
Regards,
Faraz
It is having all the values for each file in a repetitive manner one after the other (as shown below as an example).
Using random seed: 1041151582
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
—–+————+———-+———-
1 -8.6 0.000 0.000
2 -8.1 3.817 6.149
3 -8.0 2.208 3.062
4 -8.0 1.472 1.762
5 -7.8 1.736 2.232
6 -7.7 3.860 6.192
7 -7.7 3.854 6.141
8 -7.6 3.751 6.145
9 -7.6 3.200 5.821
Using random seed: 1041151582
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
—–+————+———-+———-
1 -8.3 0.000 0.000
2 -8.3 0.018 2.670
3 -8.0 1.686 2.186
4 -8.0 1.692 2.659
5 -7.5 2.726 3.606
6 -7.4 2.728 3.857
7 -7.3 3.132 6.883
8 -7.1 2.692 3.585
9 -7.0 2.716 3.453
Using random seed: 1041151582
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
—–+————+———-+———-
1 -8.0 0.000 0.000
2 -8.0 1.279 2.121
3 -7.7 1.684 2.132
4 -7.6 2.868 3.625
5 -7.5 3.130 6.368
6 -7.4 2.482 3.321
7 -7.3 2.396 5.734
8 -7.2 2.689 3.690
9 -6.9 3.053 6.551
Using random seed: 1041151582
Sure, I will update the script soon. It would be helpful if you could email me your file at muniba@bioinformaticsreview.com.
Best