I-Tasser stands for the iterative threading assembly refinement is a well-known tool for ab-initio structure modeling of proteins [1]. It uses secondary-structure enhanced profile-profile threading alignment (PPA) [2] and iterative structure assembly simulations using a threading assembly refinement program [3]. I-Tasser is used for ab-initio prediction when the similarity of a protein is quite low (<=30%). Mostly, the I-Tasser server [4] is used for this purpose, which can be easily accessed by registering with a valid institutional mail ID. In this article, we will learn how to predict a protein structure using the I-Tasser standalone version.
This article is being written by the demand of our esteemed readers and we are not going to get into more details of the algorithm applied by the I-Tasser (if you wish to know about the algorithm, drop me an email). The following sections will explain the downloading, installation, preparation, and submission of the query protein on a Linux platform. So, let’s get started!
Getting started
It is good to update and upgrade your Ubuntu system first. Open the terminal by pressing Ctrl+Alt+T altogether and type the following commands:
$ sudo apt-get update
$ sudo apt-get upgrade
Downloading the Suite package
For downloading the suite package you must have to be registered on I-Tasser website and have to request a password for non-commercial use of the software. After getting the password, you will be able to log in and download the latest version of the package available.
Installation
Open the terminal and enter the directory (let’s say Downloads) where you downloaded the package and unpack it by typing the following commands in the terminal.
$ cd Downloads
$ tar -xvjf I-TASSER5.1.tar.bz2
It will create a new folder named I-Tasser5.1 in the Downloads directory, enter the folder and you will find a Perl script named ‘download_lib.pl’. Run this script in the terminal from the same directory to download important libraries, it will take a while to finish.
$ cd I-Tasser5.1
$ ./download_lib.pl -P true -B true -N true
After downloading all the important libraries, a new folder will be generated named ‘libdir’ inside the I-Tasser5.1 directory. Now you need to prepare your input file as explained in the following section.
Preparing the input
- Create a directory, say example, in the I-Tasser5.1 folder which is required to save the query protein sequence and the output files.
- Let’s save this query protein sequence as ‘seq.fasta’ (the sequence must be in fasta format only and the residues should not be more than 1500).
- Save the same sequence file in the I-Tasser5.1 folder also.
Submitting the job
Now you can submit your query sequence for structure prediction by using the run I-Tasser.pl script present in the I-Tassermod folder. So, enter this folder and write the following commands:
$ cd I-Tasser5.1/I-Tassermod
$ sudo ./runI-Tasser.pl -libdir /home/username/Downloads/I-Tasser5.1/libdir -seqname protein -datadir /home/username/Downloads/I-Tasser5.1/example
-seqname is the name of your query protein file you saved in the I-Tassermod folder (i.e., protein).
-libdir is the folder for libraries which were downloaded earlier, write the full path to this folder.
-datadir is the folder where you have saved your query sequence (i.e., seq.fasta), write the full path to this folder.
There are many other options which you can specify for your job, e.g., to predict the gene ontology, EC number, ligand binding site, and so on. You can find these arguments in a file present in the I-Tasser5.1 folder.
-GO true -EC true -LBS true
After pressing enter, your job will be submitted. I-Tasser runs many simulations on the protein so it could take days to finish one job, in my case, it was finished in 7 days. After the job will be finished, you will be able to see the PDB file for the query protein which you can analyze with a molecular viewer such as PyMol [5].
For any query, you can comment below, or write me at tariq@bioinformaticsreview.com.
References
- Roy, A., Kucukural, A., & Zhang, Y. (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 5(4), 725-738.
- Wu, S., & Zhang, Y. (2007). LOMETS: a local meta-threading-server for protein structure prediction. Nucleic acids research, 35(10), 3375-3382.
- Zhang, Y., & Skolnick, J. (2004). Automated structure prediction of weakly homologous proteins on a genomic scale. Proceedings of the National Academy of Sciences of the United States of America, 101(20), 7594-7599.
- Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC bioinformatics, 9(1), 40.
- DeLano, W. L. (2002). The PyMOL molecular graphics system. http://pymol. org.
sir could you please confirm where is the newly predicted is saved.
I open the outdir folder and there are 2 folders named model1 and ssite. the model1 folder has 3 sub folder named coach, cofactor and tmsite. inside coach folder there are several pdb files named CH_complex1.pdb, CH_complex2.pdb and so on and there is a file named CH_protein.pdb. I think CH_protein.pdb is the predicted protein stucture but i am not sure. could you please guide me.
Yes, CH_protein.pdb is the one you are looking for. All these structures are predicted but they are ranked.