There are three principle methods for predicting 3D structure of a protein:
- Homology Modeling
- Fold Recognition
- Ab Initio Folding
Homology Modeling

The ultimate goal of protein modeling is to predict a structure from its sequence with a accuracy that is comparable to the best results achieved experimentally. [2] Homology modeling is also referred to as comparative protein modeling or knowledge-based modeling. The idea behind homology modeling is to use experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target). Homology modeling is based on two observations:
- Protein structure is entirely determined by its amino acid sequence
- Structure is more stable than sequence over evolutionary periods so similar sequence usually fold into similar structures
How accurate or reliable is homology modeling?
Homology models are classified into 3 area in terms of their accuracy and reliability.
- Midnight Zone Less than 20% sequence identity. The structure cannot reliably be used as a template.
- Twilight Zone 20% - 40% sequence identity. Sequence identity may imply structural identity.
- Safe Zone 40% or more sequence identity. It is very likely that sequence identity implies structural identity.
How good can homology modeling be?
- 60 - 100% Comparable to medium resolution NMR substrate specificity
- 30 - 60% Comparable to molecular replacement in crystallography. Support site-directed mutagenesis through visualization
- < 30% Serious errors
Homology modeling involves following steps:
- template selection
- alignment
- model building
- model evaluation
Template Selection
In the safe homology modeling zone, the percentage identity between the sequence of interest and a possible template is high enough to be detected with simple sequence alignment programs such as BLAST. To identify hits, the program compares the target sequence to all the sequences of known structures in the PDB. [2] This gives us a probable set of templates. We choose the final template after finding structurally conserved regions among templates. Once a suitable template is found, we look into the PubMed database for the relevant fold to determine its biological role. We evaluate whether the biological/biochemical function of the proteins match. We also pay attention to resolution, experimental methods used, experimental conditions such as pH, ligands, cofactors, and the protein's family.
Similar sequence does not always imply similar structure. Identical sequence does not always imply identical structure.
Alignment
There are three principal techniques for alignment:
- Use multiple sequence alignment for sequence pairs with over 40% identity. Multiple sequence alignment is used to produce an alignment by superimposing all template structures. The target sequence is then added to this alignment. The resulting alignment is compared and then adjusted.
- Use the structural alignment of the template to guide the alignment of the target
- Use profile for template and target
The quality of the sequence alignment is of crucial importance. No current comparative modeling method can recover from an incorrect alignment. Misplaced gaps, representing insertions or deletions, will cause residues to be misplaced in space. Careful inspection and adjustment on Automatic alignment may improve the quality of the modeling.
Model Building
There are three ways to build a protein model:
- manual modeling using swiss modeler or similar tool
- template based fragment assembly
- satisfaction of spatial constraints
Template based fragment assembly
Swiss modeler can be used for assembling fragments. This method involves assembling rigid fragments from homologous proteins of known structure. First we find structurally conserved core regions and construct an averaged backbone of all templates to build a model core. Then we model loops and side chains.
backbone generation
When the alignment is ready, the actual modeling can start. Creating the backbone is trivial for most of the model. Simply copy the coordinates of those residues that show up in the alignment with the model sequence. If two aligned residues differ, only the backbone coordinates can be copied. If they are the same, side chains can also be included. [2]
loop modeling
In majority of cases, the alignment between model and template sequence contains gaps. Either gaps in the model sequence or in the template sequence. If there are gaps in the model sequence, we can simply omit residues from the template, creating a hole in the model that must be filled. If there are gaps in the template sequence, we take continuous backbone from the template, cut it, and insert the missing residues. Both cases imply a conformational change of the backbone. Conformational changes cannot happen within regular secondary structures, meaning that they must be in loops or turns. Predicting loops and turns is very difficult. [2]
There are two main methods to predict loops:
- Knowledge based: search for known loops with endpoints that match our residues from PDB and simply copy the loop conformation
- Energy based: use an energy function to judge the quality of the loop
Reliable loops can be built for up to 5-8 residues.
side-chain modeling
To model chain we look for the most probable side chain conformation, using:
- Homologous structure information
- Backbone dependent rotamer libraries
- Energetic and packing criteria
When we compare the side-chain conformations (rotamers) of residues that are conserved in structurally similar proteins, we find that they often have similar torsion angles about Cα-Cβ bond. It is therefore possible to simply copy conserved residues from the template to the target. In practice, this is accurate only at high levels of identity. [2]
Only a small fraction of all possible side chain conformations is observed in experimental structure. This significantly reduces the complexity of the modeling problem. Rotamer libraries provide an ensemble of likely conformations. The propensity of rotamers depends on the backbone geometry. Side chain modeling depends heavily on rotamer libraries.
Energy minimization
Modeling often produces unfavorable bond lengths, bond angles, torsion angles and contacts. Therefore, it is import to minimize energy to regularize local bond and angle geometry and to relax close contacts and geometric chain. It must however be noted that extensive energy minimization moves coordinates away from real structure. It is therefore prudent to keep energy minimization steps to a minimum
Satisfaction of spatial constraints
In this method, we
- align sequences with structures
- extract spatial restraints
- satisfy spatial restraints
Restraints are distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms. Spatial restraints can be obtained from .
Spatial restraints be obtained from:
- a statistical analysis of relationships between many pairs of homologous structures
- tables quantifying various correlations
- expressed conditional probability density functions
Models Evaluation
Errors in homology modeling generate from:
- side chain packing
- distortion and shifts
- no template
- misalignments
- incorrect template
Source
[1] http://bmc.ub.uni-potsdam.de/1475-2859-4-20/
[2] Structural Bioinformatics by Bourne & Weisseg
[3] Lecture notes of Dr. Lina Yip