Protein-protein interactions (PPIs) have become necessary in order to study many biological processes. In order to study the PPIs, the binding affinity among the proteins is predicted. Experimental prediction of PPIs requires expensive setup and is very tedious. Therefore, computational methods are used to predict the binding affinity, which is less time taking and provides accurate results.
The binding affinity prediction among the protein complexes poses a problem which has been addressed since the past two decades [1,2]. Various computational methods for binding affinity prediction have been proposed using the empirical scoring functions [3,4,5], knowledge-based methods [6,7,8,9], and QSARs [10,11]. These methods have a few limitations, such as they could handle a small amount of data only, and the results are not much accurate [12].
Yugandhar and Gromiha (2014), have proposed a most accurate and novel method of binding affinity prediction using their amino acid sequence [12]. In this method, the protein-protein complexes are first classified on the basis of their molecular weights, functions, percentage of binding site residues, then the relation between the sequence and the structural properties is analyzed, and thereby the binding affinity.
The sequence-based features include predicted binding site residues [13] and property values of 20 amino acids from AAindex database [14]. The structure-based features include predicted binding site residues using the SPPIDER webserver [15], the number of hydrogen bonds [16], accessible surface area [17], non-bonded interaction energy [18], electrostatic energy and energy due to bond length, bond angle, and torsion angle [19]. The lesser number of properties are used because several of them are inter-related to each other which could cause a bias in the generation of the model [12]. After that, they compared the correlation between all possible pairs of properties, which left them with 113 features/ properties [12]. For the ease of identification of features affecting binding affinity, the protein complexes are classified into different groups: [12].
- Antigen-Antibody: Complex formed by interaction between antigen and antibody.
- Enzyme-Inhibitor: Complex formed by interaction between enzyme and inhibitor.
- Other enzymes: Complexes in which one of the interacting proteins is enzyme and the other one is any thing other than an inhibitory protein.
- G-protein containing: Complexes in which one of the interacting proteins is a G-protein.
- Receptor containing: Complexes in which one of the interacting proteins functions as a receptor.
- Miscellaneous: Which does not fall in any of the above classes.
How is the binding affinity predicted using amino acid sequences?
An independent regression model is generated for all the classified groups by combining more than one feature using multiple regression technique [20]. The performance of generated model is validated by jack-knife test (a resampling test performed for machine learning algorithms). After that, a step-wise least square fit test is performed using multiple regression technique for identifying the combinations of features to predict the binding affinity at high accuracy [12], and P-value is estimated to know the significance of the data (combinations of protein complexes). If the P-value <0.05, then it is statistically significant, otherwise other combinations of features are considered followed by the same procedure.
Yugandhar and Gromiha (2014), developed a web server PPA-Pred which is used for predicting binding affinity of protein-protein complexes through their amino acids sequence (http://www.iitm.ac.in/bioinfo/PPA_Pred/). This server can handle protein sequences containing maximum length of 50 amino acids. It requires the functional information and amino acid sequence in FASTA format and results the binding affinity, delta-G value and Kd value [12]. Kd is a dissociation constant which is derived from the following equation:
ln Kd = delta-G / RT
where,
delta-G is the dissociation free energy, Kd is the dissociation constant, R is the gas constant (1.987 10–3 kcal mol–1 K–1), and T is the temperature (assumed to be room temperature i.e. 25C) [12].
For further reading, click here.
References:
- Horton,N. and Lewis,M. (1992) Calculation of the free energy of association for protein complexes. Protein Sci., 1, 169–181
- Kastritis,P.L. and Bonvin,A.M. (2010) Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J. Proteome Res., 9, 2216–2225.
- Audie,J. and Scarlata,S. (2007) A novel empirical free energy function that explains and predicts protein-protein binding affinities. Biophys. Chem., 129, 198–211
- Jiang,L. et al. (2002) Potential of mean force for protein-protein interaction studies. Proteins, 46, 190–196.
- Ma,X.H. et al. (2002) A fast empirical approach to binding free energy calculations based on protein interface information. Protein Eng., 15, 677–681.
- Moal,I.H. et al. (2011) Protein-protein binding affinity prediction on a diverse set of structures. Bioinformatics, 27, 3002–3009.
- Su,Y. et al. (2009) Quantitative prediction of protein-protein binding affinity with a potential of mean force considering volume correction. Protein Sci., 18, 2550–2558
- Vreven,T. et al. (2012) Prediction of protein-protein binding free energies. Protein Sci., 21, 396–404.
- Zhang,C. et al. (2005) A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. J. Med. Chem., 48, 2325–2335
- Tian,F. et al. (2012) Structure-based prediction of protein-protein binding affinity with consideration of allosteric effect. Amino Acids, 43, 531–543.
- Zhou,P. et al. (2013) Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein–protein binding affinity. J. Comput. Aided Mol. Des., 27, 67–78.
- K. Yugandhar and M. Michael Gromiha. Protein–protein binding affinity prediction from amino acid sequence.Vol. 30 no. 24 2014, pages 3583–3589. doi:10.1093/bioinformatics/btu580
- Ofran,Y. and Rost,B. (2007) Interaction sites identified from sequence. Bioinformatics, 23, e13–e16
- Kawashima,S. et al. (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 36, D202–D205.
- Porollo,A. and Meller,J. (2007) Prediction-based fingerprints of protein-protein interactions. Proteins, 66, 630–645
- McDonald,I.K. and Thornton,J.M. (1994) Satisfying hydrogen-bonding potential in proteins. J. Mol. Biol., 238, 777–793
- Hubbard,S.J. and Thornton,J.M. (1993) NACCESS 2.1.1. Department of Biochemistry and Molecular Biology, University College, London
- Gromiha,M.M. et al. (2009) Energy based approach for understanding the recognition mechanism in protein-protein complexes. Mol. Biosyst., 5, 1779–1786.
- Guex,N. and Peitsch,M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, 18, 2714–2723
- Grewal,P.S. (1987) Numerical Methods of Statistical Analysis. Sterling Publishers, New Delhi