Ab initio structure prediction seeks to predict the native conformation of a protein from amino acid sequence alone. Comparative modeling depends on finding a suitable template structure. In the absence of a suitable structure, ab initio prediction is the only method. A typical procedure would be to define a mathematical representation of a polypeptide chain and the surrounding solvent, define an energy function that accurately represents the physiochemical properties of proteins and use and algorithm to search for a chain conformation which possesses the minimum free energy. The problem with ab initio methods is that even short polypeptide chains can fold into a potentially infinite number of structures. [1]
Background
There are three different views of proteins:
- Anfinson's Paradigm: All necessary information for forming the unique 3D structure of a protein is contained in the amino acid sequence.
- Levinthal's Paradox: Getting the right conformation through random searching would take infinite amount of time.
- Size of the protein sequence space: Protein sequence space explores the following questions. How many different proteins are theoretically possible? How many of these have been tested during evolution? We quickly realize that even with space restrictions, the search complexity would quickly reach infinity.
This strongly suggests that not all possible conformations have been tried in nature. In fact, nature tends to recycle what works.
We know that:
- many protein fold spontaneously into their native structures.
- protein folding is very fast
- chaperones speed up folding but do not alter the protein structure
- protein structure contains all the information necessary to create a correctly folded protein
Based on the above, can we predict protein structures from protein sequences alone (ab initio)?
Factors effecting protein fold:
- H-bonds
- hydrophobic effect
- salt bridges
- disulfide bonds
- loss of solvation
- entropy change
- dispersion/van der Waals forces
- conformational energy
When proteins move from unfolded to folded conformations, they move from high energy state to lower energy state.
Successful structure prediction requires free energy function sufficiently close to the true potential for the native state to be at one of the lowest energy minima, as well as a method for searching conformational space for low energy minima. Ab initio structure prediction is challenging because current potential functions have limited accuracy, and the conformational space to be searched is vast. Many methods use reduced representations, simplified potentials, and coarse search strategies in recognition of this resolution limit. [2]
Representing a polypeptide chain
The most detailed representations include all atoms of the protein and the surrounding solvent molecules. However, representing this large number of atoms and the interaction between them is quite computationally expensive, and it is not clear that this level of detail is necessary during the phase of the search far from the native conformation. To streamline the calculations, representations can be simplified in a variety of ways such as reducing the size of the conformational space. [2]
Potential Functions
There are two categories of potentials that may be employed in evaluating the free energy of the peptide chain and the surrounding solvent.
- Molecular mechanics potentials seek to model the forces that determine protein conformation using physically based functional forms parameterized from small molecule data
- in vacuo quantum mechanical calculations.
Molecular mechanics describes interactions of atoms or groups. :
- bond stretching - hooke's law
- angle bending - hooke's law
- torsional terms -
- van der waals forces
- electrostatic interaction - coulomb's law & poisson boltzmann's equation
Search Methods
In searching, as in selecting appropriate level of detail in the representation and in the potential, one must choose granularity of the search based on the resolution desired from the method. Molecular dynamics is used for this as it models changes in conformation over time using a forcefield. A single is search is most likely to find the local minimum. Therefore, several iterations are need to find the global minimum.
Source
[1] Principles of Proteomics by R. M. Twyman
[2] Structural Bioinformatics by Bourne & Weissig