research advances

What have we learned from domains of unknown function?

PSI-SGKB [doi:10.1038/fa_psisgkb.2009.54]

Structural comparison reveals that nearly three-quarters of protein domains of unknown function actually belong to characterized families.

Distribution of domains of unknown-function structures with regard to structural similarity to known structures.

Criticism of structural genomics, and of the Protein Structure Initiative (PSI) in particular, has largely centered on the choice of protein targets. For its first 10 years, the PSI has focused on increasing the range of protein folds solved, rather than on individual well-characterized proteins that lacked molecular structures.

This approach has its merits — it produces structures that otherwise wouldn't be solved, it doesn't step on the toes of individual structural biology labs, and it gives a starting point for biologists to study the vast number of poorly characterized proteins collected in the families of domains of unknown function.

This hasn't convinced the project's detractors and so phase 3 of PSI will change tack and emphasize biological function (see news on PSI:Biology). But before that, what should be considered are the more than 250 structures of domains of unknown function that the PSI and other consortia have solved.

Jaroszewski et al. from PSI JCMM and JCSG had a good look at 248 of these domain families solved by the PSI up to October 2008 and compared them with known structures, taking into account remote homology, which is not detectable by standard homology-recognition programs.

The first surprise is that 25% of these domains could be linked to protein families of known structure. The next 48% of domains of unknown function, despite little sequence similarity to any previously recognized family, were found to have known folds. For about half of these, possible homology was initially overlooked because it was below a significance threshold or because the homology could only be identified by indirect methods.

This leaves 27% of the 248 domains of unknown-function families as novel folds. Closer examination turned up another surprise: a third of these contain reasonable-sized fragments with significant structural similarity to other similar-sized fragments in known proteins and this trend is not specific to the domains of unknown-function families. It is observed in all recently seen new protein folds. This suggests that other forces than evolution may be at work here — physical limitations of fold space brought about by compactness and specific features of the polypeptide chain.

But what does this tell us about function? Only 10% of these domains now have a function proposed in the existing literature, so can we improve on this using the team's newly discovered information on structural similarity? Using their earlier structural homology work, they assigned potential functions to an additional 31% of proteins; thus 41% of these so-called domains of unknown function can be annotated to some degree.

These annotations are available in the supplementary material to the paper, but more importantly, the JCSG team also presents them on the wiki-like pages of the TOPSAN annotation system (see http://www.topsan.org/Groups/DUFs for links to pages for individual DUF structures), inviting everybody not only to view, but also to add to them.

After all, more than half of all domains of unknown-function families are still without a reliable hypothesis for their function, and so this is fertile ground for further study. But this analysis does indicate that the number of new folds is beginning to plateau, and so the role of high-throughput structural biology is likely to change. Instead of identifying new folds, it will have to change focus and look more at the connections between each fold, thus providing information on the evolution of protein structure and function, ultimately helping to decipher protein function.

Related articles

Network coverage in Thermotoga maritima

Maria Hodges

References:
  1. L. Jaroszewski, Z. Li, S. Sri Krishna, C. Bakolitsa, J. Wooley et al. Exploration of uncharted regions of the protein universe.

    PLoS Biol. 7, e1000205 (2009). doi:10.1371/journal.pbio.1000205

search

Explore proteins and this website

search

help