It is difficult to manage thousands of compounds altogether while performing virtual high-throughput screening. Compounds databases allow to download of molecules in different formats such as the ZINC database [1] allows downloading a batch file that is processed afterward. In this article, we will download small molecules from the ZINC database [1] that can be used in virtual screening.
Downloading batch file
- Go to the download page of the ZINC database.
- There you can see multiple options to download such as drug-like, lead-like, clean, and so on.
- Select an appropriate category according to your work and download it by clicking on it. A new page will be displayed showing downloading options for Linux and Windows.
- Download in MOL2 or SDF format.
- If you are downloading for Linux, a csh file will be downloaded whereas, for Windows, a batch file will be downloaded.
Downloading structures
On Ubuntu
- Open a terminal (Ctrl+Alt+T).
- Change to the directory where you have downloaded the batch file:
$ cd Downloads/
$ chmod +x usual.sdf.csh
$ csh usual.sdf.csh
- It will download all structures in sdf.gz format.
$ gunzip -v *.sdf.gz
. It will provide thousands of structures.- If you want to combine these files into one to ease virtual high-throughput screening, then type the following command:
$ cat *.sdf > all_clean.sdf
On Windows
You will have to install wget on Windows as shown below:
- Download an executable of wget from here.
- Now open command prompt (cmd) and type
>path
- Copy the executable to C:\WINDOWS\System32.
After that, restart the command prompt and extract all molecules as shown below:
- Go to the folder where you have downloaded the batch file.
2. Right-click on file –> Run as administrator. A command prompt will appear and will download all files. This step will take a lot of time.
NOTE: It might be possible that an error will be displayed on the command prompt indicating “404 NOT FOUND”. For that, open this file in an editor and edit the URL given in the file.
Look at the second line, it would be set base=http://zinc.docking.org/db/bysubset/16
.
Change it to set base=http://zinc12.docking.org/db/bysubset/16
.
3. Extract these files: >unzip *.sdf.gz
4. Combine these files as: >cat *.sdf > all_clean.sdf
Now, you can use these structures for virtual screening.
References
- Irwin, J. J., & Shoichet, B. K. (2005). ZINC− a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45(1), 177-182.