Tutorial: Basic protein structure modeling using MODELLER

SALI MODELLER [1] is one of the most widely used command-line bioinformatics software for protein structure prediction based on homology modeling. The installation of MODELLER on Ubuntu has already been explained in an article published previously. This article will explain how to perform basic modeling of a protein sequence having a high percent identity with the template.

The complete process and commands are explained in the following steps:

1. Prepare input sequence

The input format of the protein sequence required for MODELLER is PIR. You can convert your FASTA sequence into PIR format using EMBOSS Seqret. Here, we are saving the file as ‘input.ali’.

2. Search for a template

First, search for a template that is highly identical to the input sequence. This can be easily done by using the build_profile.py file. For that, open a terminal (Ctrl+Alt+T) and type the following command:

$ python build_profile.py > build_profile.log

You can change some searching parameters in ‘build_profile.py’ For example, you can change the similarity matrix that is set here as BLOSUM62 (blosum62.sim.mat) by default.

The output of the previous command will result in a ‘build_profile.log’ file and ‘build_profile.prf’. Open the ‘build_profile.prf’ file, you will see the number of identical sequences and some other related details followed by searched templates in a tabular form. The most important details of searched templates are mentioned in the second, eleventh, and twelfth columns. The second column defines the template PDB ID, the eleventh column shows the percent identity, and the twelfth column reports the significance of the target-template alignment.

3. Select a template

To select a most appropriate template for homology modeling, we will use another script of MODELLER, ‘compare.py‘.

$ python compare.py > compare.log

Open the output file ‘compare.log’. You will see a matrix consisting of % sequence identities in the lower triangle, the number of residues in the upper triangle, and the number of identical residues on the diagonal. After that, there will be a weighted pair-group average clustering based on the distance matrix. On the basis of these results and the resolution of crystal structures of the templates, you can select an appropriate template for building a model of your protein sequence.

4. Target-template alignment

Now, we will align the input protein sequence to the selected template sequence using ‘align2d.py‘. This script aligns the input sequence present in ‘input.ali’ with a template structure present in .pdb format.

$ python align2d.py

This command will result in two output files: ‘input-template.ali‘ in PIR format and ‘input-template.pap‘ in PAP format. Here, the template refers to the PDB ID of the selected template.

5. Build Model

Now, after generating target-template alignment, we can move toward model building. For this, ‘model-single.py‘ script will be used. This script generates 5 models by default. If you need more or less number of models, you can change that in the script under ‘a.ending_model‘.

$ python model-single.py > model-single.log

You can find the summary of all built models in the last section of the output log file.

6. Evaluate model

The generated models can be evaluated using DOPE and GA341 assessment scores provided for all models in the log file. A model showing the lowest DOPE score and/or highest GA341 score can be selected. ‘evaluate_model.py‘ script can be used for model selection. This script selects the model with the best DOPE score amongst all built models.

$ python evaluate_model.py

This command will result in ‘input.profile‘ file that can be used to plot a graph showing the DOPE score of model and template. The plot can also be generated using ‘plot_profiles.py‘ script as shown below. This script can be found in tutorial zip.

$ python plot_profiles.py

References

Eswar, N., Webb, B., Marti‐Renom, M. A., Madhusudhan, M. S., Eramian, D., Shen, M. Y., … & Sali, A. (2006). Comparative protein structure modeling using Modeller. Current protocols in bioinformatics, 15(1), 5-6.