Structure classification methods use structure alignments to help in the assignment of fold classes. Structure prediction methods require that the predicted structure be evaluated against a variety of template structures. Since structures are more conserved in evolution than sequences, structural alignments reveal distant sequence relationships not available from sequence alignments alone. Structural similarity is a more sensitive method than sequence alignment to determine protein function.
Quantifying Similarity
One way to quantify similarity is to superpose protein structures and calculate the distances between equivalent atoms. The distances are used to calculate the root mean square deviation (RMSD). RMSD measures the overall deviation of the atoms. It also amplifies large deviations in local regions of a protein. RMSD calculation usually involves only a subset of aligned atoms. The problem is to define this subset. Small RMSD using many atoms indicates a good structural alignment.
Structural superposition: We know at least some of the residues that match between the two proteins.
Structural alignment: We don't know any residues that match between the two proteins.
Structural superposition problem can be solved by taking the least square RMSD. Requires finding the right transformation. Solved in O(n) time. Structural alignment is an NP-hard problem. Requires comparing different proteins with different lengths. You can either compare both proteins directly or compare features separately.
Currently, there are a number of methods. Most of them can align the obvious features correctly but fail otherwise. Good alignments are rare and the software is slow.
Structural Alignment is a three step process:
- structural description of protein A and B for comparison
sequence, secondary structure, structural attributes of individual amino acids, distance between amino acids in proteins - optimise the alignment between A and B
- point based methods
- using vectors to represent secondary structures
- computational methods - dynamic programming, heuristic, genetic algorithms, etc.
- measure the statistical significance of the alignment against some random set of structure comparisons
Functional annotation based on protein structure requires a rigorous and standardized system for the classification of different structures. Several different hierarchical classification schemes have been established, which divide proteins first into general classes based on the proportion of various secondary structures they contain, then into successively more specialized groups based on how those structures are arranged. These schemes are implemented in databases such as FSSP, CATH and SCOP. [1]
These databases classify differently:
- FSSP is implemented automatically using DALI
- CATH is semi-automatic, automated with SSAP but the results are curated
- SCOP is fully manual classification
Sometimes they classify the same protein differently. Further confusion is caused by structures which appear very often (superfolds). It is difficult to know whether a given superfold is homologous or analogous.
Source
[1] Principles of Proteomics by R. M. Twyman
[2] Structural Bioinformatics by Bourne & Weissig