Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Apr 4:15:1351678.
doi: 10.3389/fmicb.2024.1351678. eCollection 2024.

Integrating biological knowledge for mechanistic inference in the host-associated microbiome

Affiliations
Review

Integrating biological knowledge for mechanistic inference in the host-associated microbiome

Brook E Santangelo et al. Front Microbiol. .

Abstract

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

Keywords: computational biology; databases; inference; microbiology; microbiome; ontologies.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Characterization of known resources relevant to microbiome research. (A) Schematic of the types of resources that exist and the purpose that they serve in microbiome research. Note that resource characterization is based on the prominent qualities, though many resources span these types. Affordances represent the primary purpose of the given resource type. The standardized nomenclature affordance indicates that the resource introduces new identifiers to uniquify concepts. The knowledge-based biological relationships affordance implies that the resource describes interactions among the concepts by the indicated relationship type in (B). The mechanistic hypothesis inference affordance indicates that the resource is uniquely suited to provide a mechanistic explanation when given specific queries. (B) The evaluations performed over existing resources mentioned in the Resources column of (A) within this review.
Figure 2
Figure 2
(A) Network of relationships included in integrated resources. Edges between an integrated resource and some type of primary knowledge source (microbe, protein, metabolite, pathway, or disease) represent either a categorization of the concept via an ontology or taxonomy (solid lines), or a nomenclature mapping to some identifier (dashed lines). Node size of the primary knowledge source (colored) represents the in-degree from integrated databases, where the largest nodes are those most often used for standardization. Colored points above integrated resources specify to which concept type the integrated resource maps, including indirect mappings through a general aggregate database. (B) Relationships among all general aggregate databases and primary databases, separated by category. Reference degree shows the degree to which a primary database may be referenced, indicating those most often used for standardization. E.g., If database i references another database j, a general aggregate database that in turn references database k, then i and k have a reference degree of 2. That primary database is only referenced if shown in (A). This figure was generated using code and data available in github repository: https://github.com/lozuponelab/knowledge-source-mappings.
Figure 3
Figure 3
Understanding the connectedness of integrated databases based on path length. Path length refers to the number of relationships between unique concepts, or feature types, that are included within a resource. The feature types discussed in this context are microbes, proteins (or genes, human or microbial), metabolites (human or microbial), pathways (human or microbial), and diseases (human). The concept of path length is used to assess how comprehensively a resource can be used for mechanistic inference, or which relationships are needed from other databases to do so.

References

    1. Amir A., Ozel E., Haberman Y., Shental N. (2023). Achieving pan-microbiome biological insights via the dbBact knowledge base. Nucleic Acids Res. 51, 6593–6608. doi: 10.1093/nar/gkad527 - DOI - PMC - PubMed
    1. Arita M., Karsch-Mizrachi I., Cochrane G. (2021). The international nucleotide sequence database collaboration. Nucleic Acids Res. 49, D121–D124. doi: 10.1093/nar/gkaa967, PMID: - DOI - PMC - PubMed
    1. Arkin A. P., Cottingham R. W., Henry C. S., Harris N. L., Stevens R. L., Maslov S., et al. . (2018). KBase: the United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol. 36, 566–569. doi: 10.1038/nbt.4163, PMID: - DOI - PMC - PubMed
    1. Armour C. R., Nayfach S., Pollard K. S., Sharpton T. J. (2019). A metagenomic Meta-analysis reveals functional signatures of health and disease in the human gut microbiome. mSystems 4:e00332. doi: 10.1128/mSystems.00332-18 - DOI - PMC - PubMed
    1. Armstrong A. J. S., Quinn K., Fouquier J., Li S. X., Schneider J. M., Nusbacher N. M., et al. . (2021). Systems analysis of gut microbiome influence on metabolic disease in HIV-positive and high-risk populations. mSystems 6:e01178-20. doi: 10.1128/mSystems.01178-20 - DOI - PMC - PubMed

LinkOut - more resources