How to download small molecules from ZINC database for virtual screening?

///
3 mins read
Download ZINC database

It is difficult to manage thousands of compounds altogether while performing virtual high-throughput screening. Compounds databases allow to download molecules in different formats such as the ZINC database [1] allows downloading a batch file that is processed afterward. In this article, we will download small molecules from the ZINC database [1] that can be used in virtual screening.

Downloading batch file

  1. Go to the download page of the ZINC database.
  2. There you can see multiple options to download such as drug-like, lead-like, clean, and so on.
  3. Select an appropriate category according to your work and download it by clicking on it. A new page will be displayed showing downloading options for Linux and Windows.
  4. Download in MOL2 or SDF format.
  5. If you are downloading for Linux, a csh file will be downloaded whereas, for Windows, a batch file will be downloaded.

Downloading structures

On Ubuntu

  1. Open a terminal (Ctrl+Alt+T).
  2. Change to the directory where you have downloaded the batch file:$ cd Downloads
  3. $ chmod +x usual.sdf.csh
  4. $ csh usual.sdf.csh
  5. It will download all structures in sdf.gz format.
  6. $ gunzip -v *.sdf.gz. It will provide thousands of structures.
  7. If you want to combine these files into one to ease virtual high-throughput screening, then type the following command: $ cat *.sdf > all_clean.sdf

On Windows

You will have to install wget on Windows as shown below:

  1. Download an executable of wget from here.
  2. Now open command prompt (cmd) and type >path
  3. Copy the executable to C:\WINDOWS\System32.

After that, restart command prompt and extract all molecules as shown below:

  1. Go to the folder where you have downloaded the batch file.

2. Right-click on file –> Run as administrator. A command prompt will appear and will download all files. This step will take a lot of time.

NOTE: It might be possible that an error will be displayed on command prompt indicating “404 NOT FOUND”. For that, open this file in an editor and edit the URL given in the file. Look at the second line, it would be set base=http://zinc.docking.org/db/bysubset/16. Change it to set base=http://zinc12.docking.org/db/bysubset/16.

3. Extract these files: >unzip *.sdf.gz

4. Combine these files as: >cat *.sdf > all_clean.sdf

Now, you can use these structures for virtual screening.

References

  1. Irwin, J. J., & Shoichet, B. K. (2005). ZINC− a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling45(1), 177-182.
Muniba is a Bioinformatician based in the South China University of Technology. She has cutting edge knowledge of bioinformatics tools, algorithms, and drug designing. When she is not reading she is found enjoying with the family. Know more about Muniba

Leave a Reply

operations on fasta
Previous Story

Operations on FASTA files using Perl, PHP, and Bash commands

CASTpyMOL plugin for pymol
Next Story

Plugin to visualize CASTp results in PyMOL

0 $0.00