Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 20;430(15):2256-2265.
doi: 10.1016/j.jmb.2018.03.004. Epub 2018 Mar 10.

MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping

Affiliations

MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping

Chengxin Zhang et al. J Mol Biol. .

Abstract

Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/.

Keywords: Gene Ontology; protein function prediction; protein structure prediction; protein–protein interaction; sequence profiles.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) The MetaGO algorithm for GO annotation, which contains three pipelines of global and local structure alignment (bottom red), PPI partner homolog detection (top green), and sequence homolog identification (center blue), followed by a logistic regression based combination. (B) An illustrative example of MetaGO being applied to human EBP protein (Q15125). Left panel is the superposition of I-TASSER model (red) and the PDB structure of the adiponectin receptor (cyan), which are combined with PPI-homolog (green box) and sequence-based predictions (blue box) to create the complete set of MetaGO predictions (right panel). Highlights are to illustrate how the representative function terms from individual pipelines are merged into the final MetaGO predictions.
Figure 2
Figure 2
Fmax score of the GO predictions by MetaGO, compared to that by the three component pipelines (structure, sequence, and PPI-homolog), and four control methods (GoFDR, GOtcha, BLAST, PSI-BLAST, and Naïve) at different sequence identity cut-offs for filtering functional templates. The dotted lines label the performance of MetaGO. A color version of this figure is provided as Figure S2 in Supplemental Material.
Figure 3
Figure 3
Precision-recall curves of GO predictions by MetaGO, compared to that by the three component pipelines (structure, PPIhomo, and sequence), and five control methods (GoFDR, GOtcha, BLAST, PSI-BLAST, and Naïve) at 30% sequence identity cut-off of functional templates. A color version of this figure is provided as Figure S3 in Supplemental Material.

Similar articles

Cited by

References

    1. Bateman A, Martin MJ, O’Donovan C, Magrane M, Apweiler R, Alpi E, et al. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D12. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–9. - PMC - PubMed
    1. Nichols RJ, Sen S, Choo YJ, Beltrao P, Zietek M, Chaba R, et al. Phenotypic landscape of a bacterial cell. Cell. 2011;144:143–56. - PMC - PubMed
    1. Hirschhorn JN. Genomewide Association Studies - Illuminating Biologic Pathways. New Engl J Med. 2009;360:1699–701. - PubMed
    1. Good BH, McDonald MJ, Barrick JE, Lenski RE, Desai MM. The dynamics of molecular evolution over 60,000 generations. Nature. 2017 - PMC - PubMed

Publication types

LinkOut - more resources