Why do we need structure quality assurance?
Everything we know about protein structures comes from PDB. PDB structures are used as templates to predict new other structures. If the template is wrong, then the model would also be wrong. Even though structures are determined experimentally, the result of the experiment is a model. Models can be accurate or wrong. In addition, experiments have associated errors.
X-ray crystallography model can contain chain connectivity, frame shift of fitting errors. How to ensure accurate x-ray structure? Resolution < 2 angstroms and R-factor less than 0.20.
Good parameters for structure validation
- Testable on real-space coordinates and/or crystallographic data
- Strongly correlated to structure quality
- Independent of the refinement process
- Automated and not too time-consuming computationally
Parameter to look at:
Experimental data
- R-factor, free R factor
- B-values
Basic geometry
- Bond length and angles
- Planarity (Peptide planes, Rings in sc (His, Phe, Tyr, Trp))
Dihedral angles
- φ, ψ (Ramachandran plot), ω
- X angles for side-chains (rotamer lib)
- Other dihedral angles (Cα)
Assessing local environment
- VdW interactions
- Packing
- Hydrogen bonding
Experimental Data
R-factor, Rfree
Measure of the difference between the structure factors calculated from the model and those obtained from experimental data. i.e. a measure of the differences in the observed and computed diffraction patterns.
High value -> poorer agreement, low value -> better agreement
R-factor < 0.2 is desired
R-factor values in the range 0.4 to 0.6 can be obtained from a totally random structure.
R free tend to be higher than R
B-factor
Closely related to the positional errors of the atoms
Larger B-factor > larger positional uncertainty
Basic Geometry
Deviation from ideal bond lengths and angles.
Dihedral Angles
Assessment of &phi and &psi values with reference to Ramachandran plot. Good structures show tight clustering in most favored regions. Measure the % of residues in favored regions, with the exception of G and P.
Accessing Local Environment: packing, bad contacts
Packing: Proteins in their native states are well packed.
- DACA makes use of threading potentials to calculate how well the sequence feels at home.
- Z-score tells us how well a residue feels with respect to its neighbors.
- ANOLEA calculates a non local energy for atom-atom contacts based on an atomic mean force potential. ANOLEA detects local packing errors and errors in alignment.
- Hydrogen bonds are a major stabilizing force. They can be studied for validation.
Bad contacts: where the sum of distance between pair of non-bonded atoms is smaller than the sum of VdW radii.
Structure Validation Servers
- WHAT IF / WHATCHECK
- PROCHECK
- Verify3D
- VADAR
- ANOLEA
- ERRAT
WHAT IF / WHATCHECK
In a WHAT-CHECK report, each reported fact has an assigned severity:
Error: Severe errors encountered during the analysis
Warning: Either less severe problems or uncommon features
Note: Statistical values plots or other verbose results of tests and analyses
Nike Shox nz
Thank you for this article!