It is difficult to manage thousands of compounds altogether while performing virtual high-throughput screening. Compounds databases allow to download of molecules in different formats such as the ZINC database  allows downloading a batch file that is processed afterward. In this article, we will download small molecules from the ZINC database  that can be used in virtual screening.
Downloading batch file
- Go to the download page of the ZINC database.
- There you can see multiple options to download such as drug-like, lead-like, clean, and so on.
- Select an appropriate category according to your work and download it by clicking on it. A new page will be displayed showing downloading options for Linux and Windows.
- Download in MOL2 or SDF format.
- If you are downloading for Linux, a csh file will be downloaded whereas, for Windows, a batch file will be downloaded.
- Open a terminal (Ctrl+Alt+T).
- Change to the directory where you have downloaded the batch file:
$ cd Downloads/
$ chmod +x usual.sdf.csh
$ csh usual.sdf.csh
- It will download all structures in sdf.gz format.
$ gunzip -v *.sdf.gz. It will provide thousands of structures.
- If you want to combine these files into one to ease virtual high-throughput screening, then type the following command:
$ cat *.sdf > all_clean.sdf
You will have to install wget on Windows as shown below:
- Download an executable of wget from here.
- Now open command prompt (cmd) and type
- Copy the executable to C:\WINDOWS\System32.
After that, restart the command prompt and extract all molecules as shown below:
- Go to the folder where you have downloaded the batch file.
2. Right-click on file –> Run as administrator. A command prompt will appear and will download all files. This step will take a lot of time.
NOTE: It might be possible that an error will be displayed on the command prompt indicating “404 NOT FOUND”. For that, open this file in an editor and edit the URL given in the file.
Look at the second line, it would be
Change it to
3. Extract these files:
4. Combine these files as:
>cat *.sdf > all_clean.sdf
Now, you can use these structures for virtual screening.
- Irwin, J. J., & Shoichet, B. K. (2005). ZINC− a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45(1), 177-182.
cheML.io: ML-generated molecules database
Due to the advancement of machine learning (ML) methods, we can find increasing applications of them in the field of bioinformatics as well. ML is being utilized in making personalized medicines, similarity searches in DNA and protein sequences, phylogenetics by mapping selected species on phylogenetic trees, gene and protein function annotation, generating chemical compounds, and so on. In this article, we will discuss an online database of ML-generated molecules known as cheML.io .
MitoTox- A new mitochondrial toxicity database
TANTIGEN 2.0- A Database of Tumor T-cell Antigens & Epitopes
TANTIGEN is an online database of T-cell epitopes and HLA ligands . A new version of TANTIGEN is introduced this month, known as TANTIGEN 2.0. In this article, we give a brief introduction to this new version of the database. (more…)