Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.

Homology Modeling

There are three principle methods for predicting 3D structure of a protein:

Homology Modeling


The ultimate goal of protein modeling is to predict a structure from its sequence with a accuracy that is comparable to the best results achieved experimentally. [2] Homology modeling is also referred to as comparative protein modeling or knowledge-based modeling. The idea behind homology modeling is to use experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target). Homology modeling is based on two observations:

How accurate or reliable is homology modeling?
Homology models are classified into 3 area in terms of their accuracy and reliability.

How good can homology modeling be?

Homology modeling involves following steps:

Template Selection

In the safe homology modeling zone, the percentage identity between the sequence of interest and a possible template is high enough to be detected with simple sequence alignment programs such as BLAST. To identify hits, the program compares the target sequence to all the sequences of known structures in the PDB. [2] This gives us a probable set of templates. We choose the final template after finding structurally conserved regions among templates. Once a suitable template is found, we look into the PubMed database for the relevant fold to determine its biological role. We evaluate whether the biological/biochemical function of the proteins match. We also pay attention to resolution, experimental methods used, experimental conditions such as pH, ligands, cofactors, and the protein's family.

Similar sequence does not always imply similar structure. Identical sequence does not always imply identical structure.

Alignment

There are three principal techniques for alignment:

The quality of the sequence alignment is of crucial importance. No current comparative modeling method can recover from an incorrect alignment. Misplaced gaps, representing insertions or deletions, will cause residues to be misplaced in space. Careful inspection and adjustment on Automatic alignment may improve the quality of the modeling.

Model Building

There are three ways to build a protein model:

Template based fragment assembly

Swiss modeler can be used for assembling fragments. This method involves assembling rigid fragments from homologous proteins of known structure. First we find structurally conserved core regions and construct an averaged backbone of all templates to build a model core. Then we model loops and side chains.

backbone generation
When the alignment is ready, the actual modeling can start. Creating the backbone is trivial for most of the model. Simply copy the coordinates of those residues that show up in the alignment with the model sequence. If two aligned residues differ, only the backbone coordinates can be copied. If they are the same, side chains can also be included. [2]

loop modeling
In majority of cases, the alignment between model and template sequence contains gaps. Either gaps in the model sequence or in the template sequence. If there are gaps in the model sequence, we can simply omit residues from the template, creating a hole in the model that must be filled. If there are gaps in the template sequence, we take continuous backbone from the template, cut it, and insert the missing residues. Both cases imply a conformational change of the backbone. Conformational changes cannot happen within regular secondary structures, meaning that they must be in loops or turns. Predicting loops and turns is very difficult. [2]

There are two main methods to predict loops:

  1. Knowledge based: search for known loops with endpoints that match our residues from PDB and simply copy the loop conformation
  2. Energy based: use an energy function to judge the quality of the loop

Reliable loops can be built for up to 5-8 residues.

side-chain modeling
To model chain we look for the most probable side chain conformation, using:

When we compare the side-chain conformations (rotamers) of residues that are conserved in structurally similar proteins, we find that they often have similar torsion angles about Cα-Cβ bond. It is therefore possible to simply copy conserved residues from the template to the target. In practice, this is accurate only at high levels of identity. [2]

Only a small fraction of all possible side chain conformations is observed in experimental structure. This significantly reduces the complexity of the modeling problem. Rotamer libraries provide an ensemble of likely conformations. The propensity of rotamers depends on the backbone geometry. Side chain modeling depends heavily on rotamer libraries.

Energy minimization
Modeling often produces unfavorable bond lengths, bond angles, torsion angles and contacts. Therefore, it is import to minimize energy to regularize local bond and angle geometry and to relax close contacts and geometric chain. It must however be noted that extensive energy minimization moves coordinates away from real structure. It is therefore prudent to keep energy minimization steps to a minimum

Satisfaction of spatial constraints

In this method, we

Restraints are distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms. Spatial restraints can be obtained from .

Spatial restraints be obtained from:

Models Evaluation

Errors in homology modeling generate from:

Source

[1] http://bmc.ub.uni-potsdam.de/1475-2859-4-20/
[2] Structural Bioinformatics by Bourne & Weisseg
[3] Lecture notes of Dr. Lina Yip