Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 15;14(8):e0220728.
doi: 10.1371/journal.pone.0220728. eCollection 2019.

Advances in gene ontology utilization improve statistical power of annotation enrichment

Affiliations

Advances in gene ontology utilization improve statistical power of annotation enrichment

Eugene W Hinderer 3rd et al. PLoS One. .

Abstract

Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats-a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods. Specifically, we show that GOcats' unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value = 1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats' path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat's path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. GOcats data flow diagram for creating categories of GO.
A) GOcats enables the user to extract subgraphs of GO representing concepts as defined by keywords, each with a root (category-defining) node. B) Subgraphs extracted by GOcats are used to create a mapping from all sub-nodes in a set of subgraphs to their category-defining root node(s). This allows the user to map gene annotations in GAFs to any number of customized categories.
Fig 2
Fig 2. The has_part relation creates incongruent paths with respect to semantic scoping.
Some tools may create questionable GO term mappings, i.e. “nuclear envelope” to “plasma membrane,” since the has_part relation edges point in from super-concepts to sub-concepts. GOCats avoids this by re-interpreting the has_part edges into part_of_some edges.
Fig 3
Fig 3. Comparison of adjusted p-values for significantly-enriched annotations using GOcats paths vs excluding has_part edges.
Most significantly-enriched GO terms had an improved p-value when GOcats re-evaluated has_part edges for the enrichment of the breast cancer data set in this investigation.

References

    1. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25: 25–29. 10.1038/75556 - DOI - PMC - PubMed
    1. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25: 1251–1255. 10.1038/nbt1346 - DOI - PMC - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102: 15545–50. 10.1073/pnas.0506580102 - DOI - PMC - PubMed
    1. Na D, Son H, Gsponer J. Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity. BMC Genomics. 2014;15: 1091 10.1186/1471-2164-15-1091 - DOI - PMC - PubMed
    1. The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43: D204–D212. 10.1093/nar/gku989 - DOI - PMC - PubMed

Publication types