May 2012 technical highlight
EM-proving protein structures
A new version of EM-Fold combines medium-resolution data with computational techniques to achieve atomic resolution structures.
Best-scoring models for two of the benchmark proteins examined. The models (rainbow ribbons) are shown superimposed with the original PDB structures (gray ribbons) and the simulated density maps. Figure courtesy of Steffen Lindert.
Combining experimental data with computational techniques has allowed scientists to push the limits of the types of proteins that are amenable to high-resolution structure determination. It can be difficult to acquire experimental data that provide sufficient detail for a high-resolution structure. Furthermore, while computational techniques can predict protein structures up to a certain size, eventually the number of conformations that must be sampled to ensure a correct structure overwhelms current computational resources.
To overcome these limitations, various approaches are being developed that combine sparse or medium-resolution data with computational techniques. Meiler and colleagues have reported a new version of EM-Fold that combines medium-resolution density maps (5–7 Å) with the Rosetta protein modeling suite. With this type of data, it is often possible to manually or automatically detect secondary structural elements; however, the directionality and connectivity of these elements are unknown. Furthermore, no side chain information is available.
The new release of EM-Fold can place both α-helices and β-strands and is more robust to incorrect secondary structure assignments. Secondary structural elements are predicted from the primary sequence of a protein, and a pool of helices and strands are constructed. These constructs are then assembled into the density map and refined by EM-Fold. The best models are used in Rosetta, which adds loops and side chains. Rosetta further refines the structure, improving loop conformations by comparing the models to the density map.
The authors tested their protocol on a benchmark set of 20 α-helical proteins and 7 proteins containing β-strands. The set was limited to proteins with less than 250 amino acids; however, this limitation is more a consequence of computational abilities, and as computers become faster, the size limit for this approach should increase. Of those 27 proteins, 13 were refined to atomic resolution. The program was better able to predict the structure of α-helical proteins, likely because β-strands contain many nonlocal contacts.
EM-Fold is a promising modeling technique for acquiring atomic detail from medium resolution data and should expand the number and types of proteins that are amenable to high-resolution structure determination.