Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbaf014.
doi: 10.1093/bib/bbaf014.

GOPhage: protein function annotation for bacteriophages by integrating the genomic context

Affiliations

GOPhage: protein function annotation for bacteriophages by integrating the genomic context

Jiaojiao Guan et al. Brief Bioinform. .

Abstract

Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins presents several challenges, including their inherent diversity and the scarcity of annotated ones. Existing tools have yet to fully leverage the unique properties of phages in annotating protein functions. In this work, we propose a new protein function annotation tool for phages by leveraging the modular genomic structure of phage genomes. By employing embeddings from the latest protein foundation models and Transformer to capture contextual information between proteins in phage genomes, GOPhage surpasses state-of-the-art methods in annotating diverged proteins and proteins with uncommon functions by 6.78% and 13.05% improvement, respectively. GOPhage can annotate proteins lacking homology search results, which is critical for characterizing the rapidly accumulating phage genomes. We demonstrate the utility of GOPhage by identifying 688 potential holins in phages, which exhibit high structural conservation with known holins. The results show the potential of GOPhage to extend our understanding of newly discovered phages.

Keywords: bacteriophages; genomic contextual information; protein function annotation; protein large language model.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The function order of proteins within four phage genomes, where blue arrow represents the protein and the gray link shows the similarity among proteins.
Figure 2
Figure 2
The architecture of GOPhage, including data processing steps for training and inference in (A), leveraging the ESM2 model in (B) for per-residue embeddings, utilizing the Transformer model in (C) for contextual relationships, and integrating alignment-based methods in (D) to produce Gene Ontology (GO) term prediction scores.
Figure 3
Figure 3
Performance comparison of including versus excluding contextual proteins across three ontologies, evaluated using AUPR and Fmax metrics for term-centric analysis.
Figure 4
Figure 4
The performance on different numbers of context proteins from “length = 1” to “length >2” based on the Fmax of protein-centric.
Figure 5
Figure 5
Performance comparisons among methods across three ontologies. (a) displays AUPR for protein-centric analysis across diverse sequence identity groups, and (b) shows AUPR for term-centric analysis across groups with increasing IC values.
Figure 6
Figure 6
The analysis of the identified potential holin proteins. (a) and (b) show clusters within the top 10 phage genera and their structural similarities with known holin proteins, while (c) and (d) present 3D structures of identified holin proteins (YP_009795370.1 and YP_009823284.1) alongside database counterparts (YP_009785046.1 and YP_009790916.1).

Similar articles

Cited by

References

    1. Güemes AGC, Youle M, Cantú VA. et al. Viruses as winners in the game of life. Ann Rev Virol 2016;3:197–214. 10.1146/annurev-virology-100114-054952. - DOI - PubMed
    1. Zeng S, Almeida A, Li S. et al. A metagenomic catalog of the early-life human gut virome. Nat Commun UK London: Nature Publishing Group, 2024;15:1864. 10.1038/s41467-024-45793-z. - DOI - PMC - PubMed
    1. Wang D, Shang J, Lin H. et al. Identifying ARG-carrying bacteriophages in a lake replenished by reclaimed water using deep learning techniques. Water Res 2024;248:120859. 10.1016/j.watres.2023.120859. - DOI - PubMed
    1. Fernández L, Rodríguez A, García P. Phage or foe: An insight into the impact of viral predation on microbial communities. ISME J 2018;12:1171–9. 10.1038/s41396-018-0049-5. - DOI - PMC - PubMed
    1. Díaz-Muñoz SL, Koskella B. Bacteria–phage interactions in natural environments. Adv Appl Microbiol 2014;89:135–83. 10.1016/B978-0-12-800259-9.00004-4. - DOI - PubMed

MeSH terms