Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 26;15(1):10230.
doi: 10.1038/s41467-024-54668-2.

A proteome-wide structural systems approach reveals insights into protein families of all human herpesviruses

Affiliations

A proteome-wide structural systems approach reveals insights into protein families of all human herpesviruses

Timothy K Soh et al. Nat Commun. .

Abstract

Structure predictions have become invaluable tools, but viral proteins are absent from the EMBL/DeepMind AlphaFold database. Here, we provide proteome-wide structure predictions for all nine human herpesviruses and analyze them in depth with explicit scoring thresholds. By clustering these predictions into structural similarity groups, we identified new families, such as the HCMV UL112-113 cluster, which is conserved in alpha- and betaherpesviruses. A domain-level search found protein families consisting of subgroups with varying numbers of duplicated folds. Using large-scale structural similarity searches, we identified viral proteins with cellular folds, such as the HSV-1 US2 cluster possessing dihydrofolate reductase folds and the EBV BMRF2 cluster that might have emerged from cellular equilibrative nucleoside transporters. Our HerpesFolds database is available at https://www.herpesfolds.org/herpesfolds and displays all models and clusters through an interactive web interface. Here, we show that system-wide structure predictions can reveal homology between viral species and identify potential protein functions.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Structure predictions of the proteome of all human herpesviruses.
A Pipeline of model generation and quality scoring. The structures of 844 proteins covering the proteomes of all nine human herpesviruses were initially predicted with LocalColabFold with three recycles. After scoring, models that failed any quality score were rerun with AlphaFold and LocalColabFold with 20 recycles. B Percentage of each viral proteome that passed the quality score. C Histogram of the pTM scores of the initial LocalColabFold models that passed or failed the quality scores. Histogram of the percentage of each protein with a pLDDT score below 0.50 (D) or above 0.70 (E). In both cases, a Gaussian distribution was fit to identify the distribution for the proteins that passed the quality scores. F HerpesFolds online database is available at https://www.herpesfolds.org/herpesfolds. Structure predictions are organized and searchable by homology and structural similarity. G Example webpage of the displayed information for each predicted model. H Searchable clustering of all human herpesvirus proteins based on structure (Foldseek) or sequence (HHblits) similarity is available at https://www.herpesfolds.org/herpesclusters.
Fig. 2
Fig. 2. Structural similarity search identifies protein clusters.
A Graphical display of all structurally similar protein clusters. The 690 herpes proteins that passed the quality score were analyzed pairwise by Foldseek for structural similarity. The color scheme for the viral species is used throughout this work. The name of the protein is written in the center of each node and is legible upon zooming in https://www.herpesfolds.org/herpesclusters allows interactive browsing of the clusters. B Cluster of viral serine/threonine-protein kinases US3 and UL13. C Cluster of viral DNA helicase/primase complex-associated protein (HEPA) and DNA polymerase catalytic subunit (DPOL). D Clusters of small capsomere-interacting protein (SCP). The alpha-, beta-, and gammaherpesviruses clustered separately. The corresponding UCSF ChimeraX session can be found at https://zenodo.org/records/13284140.
Fig. 3
Fig. 3. Structural clustering identifies differently grouped protein groups.
A Structural alignment of the HCMV UL112-113 cluster. Only the conserved beta-barrel-like domain is shown for clarity of HCMV UL112-113, HHV-6A U79_U80, HHV-6B U79, HHV-7 U79, HSV-1 UL4, HSV-2 UL4, and VZV ORF56. B Structural alignment of the LANA DNA binding domain-containing cluster with only the matching domain shown for clarity of KSHV ORF73, EBV EBNA1, HHV-6A U84, HCMV UL117, HCMV UL122, HCMV UL123, HHV-6A U86, HHV-6A U90_U87U86, HHV-6B U84, HHV-6B U90_U86, HHV-7 U84, HHV-7 U86, HSV-1 UL3, HSV-2 UL3, and VZV ORF58. C Ig-like domain-containing cluster harboring the HCMV RL11 protein family. The canonical members are marked with a black outline. D Structural alignment of HCMV RL11 with HCMV RL13. The blue arrow points to the additional beta strand. The corresponding UCSF ChimeraX sessions can be found at https://zenodo.org/records/13284140.
Fig. 4
Fig. 4. Domain-level similarity search identifies conserved domain duplications.
A Domain-level similarities identified using a sliding window approach. The “SW Unique Hits” are the number of proteins that had a structurally similar query-target pair in the sliding window analysis that was absent from the full-length analysis. “Internal Duplications” are proteins where parts of the protein are similar to another part of itself. “Repetitive Acquisition” are proteins where a piece of a different protein matches the query protein multiple times at different positions. “Domain Addition” refers to proteins with multiple domains matching different proteins. Illustrative cartoons are shown above the corresponding bar. B HSV-1 UL15 is aligned to the known homolog HSV-2 UL15 and the structurally similar target HSV-1 UL9, which was only identified in the domain-level search. The part of HSV-1 UL15 that aligns with HSV-1 UL9 is colored blue. C The cluster of HCMV proteins that contain HCMV US22. The proteins that contain 2 domains, like the canonical US22, are shown in the HCMV color. The proteins with one domain are shown in light red, and the protein that has four domains is shown in dark red. D The tertiary structure of the core domain is similar between the proteins in this group. The top row is the full-length protein, the bottom row is a structural alignment of the core domain, and the RMSD against the 91 Cα of the core domain alignment is written below. HCMV US22 has two domains, shown in yellow and blue, which are aligned. HCMV UL29 has four domains and was aligned to US22. HCMV US22 was aligned to HCMV US23, which has two domains, and HCMV US23 was aligned to HCMV UL26, which has one domain. In the case of multiple domains, the best-fitting domain combination is shown. The corresponding UCSF ChimeraX sessions can be found at https://zenodo.org/records/13284140.
Fig. 5
Fig. 5. Annotation of viral protein functions from structural similarity search.
A Structural alignment of the alkaline nuclease (AN) and cytoplasmic envelopment protein 2 (CEP2) from all human herpesviruses. CEP2 was found in all human herpesviruses but had no significant cellular hit. B Cluster of KSHV ORF2, HSV-1 US2, and HSV-2 US2 containing the cellular hits. Each cellular node (blue) represents a specific protein. C Summary of (B) for clarity. D Structural alignment of the proteins in (C). P04753 dihydrofolate reductase (DHFR) was the most significant cellular hit. E Cluster of EBV BMRF2, KSHV ORF58, HSV-1 UL43, and VZV ORF15 containing the cellular hits. Each cellular node (blue) represents a specific protein. F Summary of (E) for clarity. G Structural alignment of the proteins in (F). The most significant cellular hit was Q8R139 equilibrative nucleoside transporter 4 (ENT4). H Structural architecture of the classical DUT HSV-1 UL50 and the DURP HHV-6A U54. The rainbow-colored viral protein illustrates how the polypeptide folds back on itself. The most significant cellular hit, AF-P70583 DUT from Rattus norvegicus, is shown multiple times so that it can be aligned to each domain in the viral protein. The RMSD against the 205 Cα for each alignment is shown on the right of the overlay. The disordered termini were hidden for clarity. The corresponding UCSF ChimeraX sessions can be found at https://zenodo.org/records/13284140.

References

    1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). - PMC - PubMed
    1. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science373, 871–876 (2021). - PMC - PubMed
    1. Akdel, M. et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol.29, 1056–1067 (2022). - PMC - PubMed
    1. Zmasek, C. M., Knipe, D. M., Pellett, P. E. & Scheuermann, R. H. Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Virology529, 29–42 (2019). - PMC - PubMed
    1. van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol.42, 243–246 (2024). - PMC - PubMed

Publication types

LinkOut - more resources