Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 20;18(1):e0272473.
doi: 10.1371/journal.pone.0272473. eCollection 2023.

Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach

Affiliations

Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach

Sandra Dérozier et al. PLoS One. .

Abstract

The dramatic increase in the number of microbe descriptions in databases, reports, and papers presents a two-fold challenge for accessing the information: integration of heterogeneous data in a standard ontology-based representation and normalization of the textual descriptions by semantic analysis. Recent text mining methods offer powerful ways to extract textual information and generate ontology-based representation. This paper describes the design of the Omnicrobe application that gathers comprehensive information on habitats, phenotypes, and usages of microbes from scientific sources of high interest to the microbiology community. The Omnicrobe database contains around 1 million descriptions of microbe properties. These descriptions are created by analyzing and combining six information sources of various kinds, i.e. biological resource catalogs, sequence databases and scientific literature. The microbe properties are indexed by the Ontobiotope ontology and their taxa are indexed by an extended version of the taxonomy maintained by the National Center for Biotechnology Information. The Omnicrobe application covers all domains of microbiology. With simple or rich ontology-based queries, it provides easy-to-use support in the resolution of scientific questions related to the habitats, phenotypes, and uses of microbes. We illustrate the potential of Omnicrobe with a use case from the food innovation domain.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Text-mining process.
Fig 2
Fig 2. Information system flowchart.
Fig 3
Fig 3. Omnicrobe web interface.
Republished from http://omnicrobe.migale.inrae.fr under a CC BY license, with permission from INRAE MaIAGE, original copyright 2022.
Fig 4
Fig 4. Distribution of taxa ranks in Lives_In relations in Omnicrobe.
“Strain” ranks comprise strains and isolates. “Species and subspecies” ranks include species and ranks below species and above strain (e.g., subspecies, varieties, morph). “Genus and subgenus” ranks include genus and ranks below genus and above species (e.g., subgenus, section, series). “Family and subfamily” ranks consist of family and ranks below family and above genus (e.g., subfamily, tribe). “Higher ranks” include all ranks above the family (e.g. order, class, phylum, kingdom). The height of the bars is proportional to the number of Lives_In relations in Omnicrobe.
Fig 5
Fig 5. Distribution of microbe taxa in Lives_In relations extracted from PubMed in Omnicrobe.
The taxa represented in this chart are taxon roots selected as microorganisms in Omnicrobe (see section Ontologies and taxonomies). The arc is proportional to the number of Lives_In relations that involve the taxon or any descendant. “Others” include taxa that account for less than 1% of relations: Archae, Chlamydomonadales, Chlorella, Choanoflagellida, Cryptophyta, Desmidiales, Diplomonadida, Glaucocystophyceae, Haptophyta, Ichthyosporea, Oxymonadida, Parabasalia, Prototheca, Retortamonadidae, Rhizaria.
Fig 6
Fig 6. Distribution of habitats in Lives_In relations extracted from PubMed.
This chart represents the habitats at the four highest levels in the OntoBiotope ontology. The arc is proportional to the number of Lives_In relations extracted from PubMed that involve the habitat or any descendant in OntoBiotope. Only habitats with more than 20,000 occurrences are shown for readibility.
Fig 7
Fig 7. Proportion of taxa-habitat relations in each source, which are also extracted from PubMed.
The height of the bar represents the proportion of relations per source that were also extracted from PubMed. For instance, only 10% of relations in GenBank were also extracted from PubMed (the same taxon-habitat pair), leaving 90% of relations exclusive to GenBank.
Fig 8
Fig 8. Frequency of habitats and number of different taxa to which they are linked.
The green line (left scale) represents the number of Lives_In relations extracted from PubMed that involves each of the 100 most frequent habitats. The brown line (right scale) represents the number of distinct taxa to which each habitat is linked with Lives_In relations extracted from PubMed.
Fig 9
Fig 9. Correlation between temperature tropism phenotypes in Omnicrobe.
Each box represents the intersection between the sets of taxa to which the two phenotypes are linked with Exhibits relations in Omnicrobe. The color intensity indicates the Jaccard index between the sets of taxa.
Fig 10
Fig 10. An example of complex embedded queries.
These queries are used to retrieve mesophilic or thermophilic bacteria present in soy milk and capable of acidification, and with a qualified presumption of safety.

References

    1. Ducklow H. Microbial services: challenges for microbial ecologists in a changing world. Aquat Microb Ecol. 2008;53: 13–19. doi: 10.3354/ame01220 - DOI
    1. Imhoff J. New Dimensions in Microbial Ecology—Functional Genes in Studies to Unravel the Biodiversity and Role of Functional Microbial Groups in the Environment. Microorganisms. 2016;4: 19. doi: 10.3390/microorganisms4020019 - DOI - PMC - PubMed
    1. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al.. Structure and function of the global ocean microbiome. Science. 2015;348: 1261359. doi: 10.1126/science.1261359 - DOI - PubMed
    1. The Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012;486: 215–221. doi: 10.1038/nature11209 - DOI - PMC - PubMed
    1. Krause S, Le Roux X, Niklaus PA, Van Bodegom PM, Lennon JT, Bertilsson S, et al.. Trait-based approaches for understanding microbial biodiversity and ecosystem functioning. Front Microbiol. 2014;5. doi: 10.3389/fmicb.2014.00251 - DOI - PMC - PubMed