Exploring structural diversity across the protein universe with The Encyclopedia of Domains

Andy M Lau^#¹, Nicola Bordin^#², Shaun M Kandathil¹, Ian Sillitoe², Vaishali P Waman², Jude Wells^{2

3}, Christine A Orengo², David T Jones^{1

2}

Affiliations

¹ Department of Computer Science, University College London, London WC1E 6BT, UK.
² Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
³ Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK.

^# Contributed equally.

PMID: 39480926
DOI: 10.1126/science.adq4946

Exploring structural diversity across the protein universe with The Encyclopedia of Domains

Andy M Lau et al. Science. 2024 Nov.

. 2024 Nov;386(6721):eadq4946.

doi: 10.1126/science.adq4946. Epub 2024 Nov 1.

Authors

Andy M Lau^#¹, Nicola Bordin^#², Shaun M Kandathil¹, Ian Sillitoe², Vaishali P Waman², Jude Wells^{2

3}, Christine A Orengo², David T Jones^{1

2}

Affiliations

¹ Department of Computer Science, University College London, London WC1E 6BT, UK.
² Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
³ Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK.

^# Contributed equally.

PMID: 39480926
DOI: 10.1126/science.adq4946

Abstract

The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly expanding representation of their domain space. We uncovered more than 10,000 new structural interactions between superfamilies and thousands of new folds across the fold space continuum.

PubMed Disclaimer

References

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
- Atypon
- Ovid Technologies, Inc.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploring structural diversity across the protein universe with The Encyclopedia of Domains

Affiliations

Exploring structural diversity across the protein universe with The Encyclopedia of Domains

Authors

Affiliations

Abstract

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources