redbiom: a Rapid Sample Discovery and Feature Characterization System

Daniel McDonald¹, Benjamin Kaehler², Antonio Gonzalez¹, Jeff DeReus¹, Gail Ackermann¹, Clarisse Marotz¹, Gavin Huttley³, Rob Knight^{4

5

6}

Affiliations

¹ Department of Pediatrics, University of California San Diego, La Jolla, California, USA.
² School of Science, University of New South Wales, Canberra, Australia.
³ Research School of Biology, Australian National University, Canberra, Australia.
⁴ Department of Pediatrics, University of California San Diego, La Jolla, California, USA robknight@ucsd.edu.
⁵ Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.
⁶ Center for Microbiome Innovation, University of California San Diego, La Jolla, California, USA.

PMID: 31239397
PMCID: PMC6593222
DOI: 10.1128/mSystems.00215-19

redbiom: a Rapid Sample Discovery and Feature Characterization System

Daniel McDonald et al. mSystems. 2019.

. 2019 Jun 25;4(4):e00215-19.

doi: 10.1128/mSystems.00215-19.

Authors

Daniel McDonald¹, Benjamin Kaehler², Antonio Gonzalez¹, Jeff DeReus¹, Gail Ackermann¹, Clarisse Marotz¹, Gavin Huttley³, Rob Knight^{4

5

6}

Affiliations

¹ Department of Pediatrics, University of California San Diego, La Jolla, California, USA.
² School of Science, University of New South Wales, Canberra, Australia.
³ Research School of Biology, Australian National University, Canberra, Australia.
⁴ Department of Pediatrics, University of California San Diego, La Jolla, California, USA robknight@ucsd.edu.
⁵ Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.
⁶ Center for Microbiome Innovation, University of California San Diego, La Jolla, California, USA.

PMID: 31239397
PMCID: PMC6593222
DOI: 10.1128/mSystems.00215-19

Abstract

Meta-analyses at the whole-community level have been important in microbiome studies, revealing profound features that structure Earth's microbial communities, such as the unique differentiation of microbes from the mammalian gut relative to free-living microbial communities, the separation of microbiomes in saline and nonsaline environments, and the role of pH in driving soil microbial compositions. However, our ability to identify the specific features of a microbiome that differentiate these community-level patterns have lagged behind, especially as ever-cheaper DNA sequencing has yielded increasingly large data sets. One critical gap is the ability to search for samples that contain specific features (for example, sub-operational taxonomic units [sOTUs] identified by high-resolution statistical methods for removing amplicon sequencing errors). Here we introduce redbiom, a microbiome caching layer, which allows users to rapidly query samples that contain a given feature, retrieve sample data and metadata, and search for samples that match specified metadata values or ranges (e.g., all samples with a pH of >7), implemented using an in-memory NoSQL database called Redis. By default, redbiom allows public anonymous sample access for over 100,000 publicly available samples in the Qiita database. At over 100,000 samples, the caching server requires only 35 GB of resident memory. We highlight how redbiom enables a new type of characterization of microbiome samples and provide tutorials for using redbiom with QIIME 2. redbiom is open source under the BSD license, hosted on GitHub, and can be deployed independently of Qiita to enable search of proprietary or clinically restricted microbiome databases.IMPORTANCE Although analyses that combine many microbiomes at the whole-community level have become routine, searching rapidly for microbiomes that contain a particular sequence has remained difficult. The software we present here, redbiom, dramatically accelerates this process, allowing samples that contain microbiome features to be rapidly identified. This is especially useful when taxonomic annotation is limited, allowing users to identify environments in which unannotated microbes of interest were previously observed. This approach also allows environmental or clinical factors that correlate with specific features, or vice versa, to be identified rapidly, even at a scale of billions of sequences in hundreds of thousands of samples. The software is integrated with existing analysis tools to enable fast, large-scale microbiome searches and discovery of new microbiome relationships.

Keywords: database; meta-analysis; microbiome.

PubMed Disclaimer

Figures

**FIG 1**
The redbiom data model is a key-value store built on top of Redis. By storing features and sample identifiers as keys, it is possible to rapidly query the resource for information on those entities. Similarly, by indexing the sample metadata, queries can be performed against variables of interest (e.g., pH) in order to identify sample identifiers of interest, which can then be used to extract a feature table for downstream analysis. (A) A “set” command associates a key with a value: in this case, a feature identifier is associated with the samples the feature was observed in. A “get” command can then be issued using the feature identifier as the key to obtain the associated values (i.e., the samples). (B) Feature counts (e.g., a vector from an OTU table) are associated with a composite key that describes the processing context and the sample identifier. The processing context, in this case “deblur,” denotes a bioinformatic procedure applied. For Qiita, the context names also include molecular preparation details. The expectation is the data within a context should be comparable. The sample data themselves are encoded in a sparse vector format with the feature identifiers remapped into unique integers to improve compression and reduce data redundancy. (C) The Porter stem of the word “Antibiotics.” (D) The association of metadata word stems with sample identifiers. Redis natively supports classic set operations, which can be applied to keys to obtain, for example, the intersection of sample identifiers represented by two keys.

**FIG 2**
Feature search example. Differential sOTUs from a reanalysis of the study by Ramirez et al. (13) by Morton et al. (unpublished), characterized as associating with a low- or high-pH soil, were obtained. Features were trimmed to 90 nucleotides (nt) to maximize overlap of the Earth Microbiome Project and were searched using redbiom against the Deblur 16S V4 90-nt context with the following sample constraints: “where empo_3=='Soil (non-saline)' and ph > 0.” All samples from Ramirez et al. were removed to create a sample set independent from the observation source: 560 samples remained for assessment following constraints and filtering. (A) Box-whisker plot of the pH values reported in the sample information (Mann-Whitney U statistic = 7,280, P < 9.95 × 10⁻⁶⁵). (B and C) Regressions of the reported pH values against the first principal coordinate (PC1) from unweighted (B) and weighted (C) UniFrac analysis (Pearson r = 0.552, P < 6.61 × 10⁻⁴⁶, and r = 0.562, P < 6.8 × 10⁻⁴⁸, respectively). (D to G) Principal-coordinate plots of unweighted (D) and weighted (E) UniFrac of the observed samples colored by pH and unweighted (F) and weighted (G) UniFrac colored by the Qiita study identifier. (See Table S2 for additional study information.)

See this image and copyright information in PMC

Cited by

Method development for cross-study microbiome data mining: Challenges and opportunities.
Su X, Jing G, Zhang Y, Wu S. Su X, et al. Comput Struct Biotechnol J. 2020 Aug 1;18:2075-2080. doi: 10.1016/j.csbj.2020.07.020. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32802279 Free PMC article. Review.
Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase.
Sfiligoi I, Armstrong G, Gonzalez A, McDonald D, Knight R. Sfiligoi I, et al. mSystems. 2022 Jun 28;7(3):e0002822. doi: 10.1128/msystems.00028-22. Epub 2022 May 31. mSystems. 2022. PMID: 35638356 Free PMC article.
Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes.
Armstrong G, Cantrell K, Huang S, McDonald D, Haiminen N, Carrieri AP, Zhu Q, Gonzalez A, McGrath I, Beck KL, Hakim D, Havulinna AS, Méric G, Niiranen T, Lahti L, Salomaa V, Jain M, Inouye M, Swafford AD, Kim HC, Parida L, Vázquez-Baeza Y, Knight R. Armstrong G, et al. Genome Res. 2021 Nov;31(11):2131-2137. doi: 10.1101/gr.275777.121. Epub 2021 Sep 3. Genome Res. 2021. PMID: 34479875 Free PMC article.
Consumption of Fermented Foods Is Associated with Systematic Differences in the Gut Microbiome and Metabolome.
Taylor BC, Lejzerowicz F, Poirel M, Shaffer JP, Jiang L, Aksenov A, Litwin N, Humphrey G, Martino C, Miller-Montgomery S, Dorrestein PC, Veiga P, Song SJ, McDonald D, Derrien M, Knight R. Taylor BC, et al. mSystems. 2020 Mar 17;5(2):e00901-19. doi: 10.1128/mSystems.00901-19. mSystems. 2020. PMID: 32184365 Free PMC article.
Human Skin, Oral, and Gut Microbiomes Predict Chronological Age.
Huang S, Haiminen N, Carrieri AP, Hu R, Jiang L, Parida L, Russell B, Allaband C, Zarrinpar A, Vázquez-Baeza Y, Belda-Ferre P, Zhou H, Kim HC, Swafford AD, Knight R, Xu ZZ. Huang S, et al. mSystems. 2020 Feb 11;5(1):e00630-19. doi: 10.1128/mSystems.00630-19. mSystems. 2020. PMID: 32047061 Free PMC article.

See all "Cited by" articles

References

1. Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, Schwager E, Crabtree J, Ma S, Microbiome Quality Control Project Consortium, Abnet CC, Knight R, White O, Huttenhower C. 2017. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 35:1077–1086. doi:10.1038/nbt.3981. - DOI - PMC - PubMed
1. Lozupone CA, Knight R. 2007. Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104:11436–11440. doi:10.1073/pnas.0611525104. - DOI - PMC - PubMed
1. Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI. 2008. Worlds within worlds: evolution of the vertebrate gut microbiota. Nat Rev Microbiol 6:776–788. doi:10.1038/nrmicro1978. - DOI - PMC - PubMed
1. Lozupone CA, Stombaugh J, Gonzalez A, Ackermann G, Wendel D, Vázquez-Baeza Y, Jansson JK, Gordon JI, Knight R. 2013. Meta-analyses of studies of the human microbiota. Genome Res 23:1704–1714. doi:10.1101/gr.151803.112. - DOI - PMC - PubMed
1. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, Earth Microbiome Project Consortium. 2017. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551:457–463. doi:10.1038/nature24621. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

redbiom: a Rapid Sample Discovery and Feature Characterization System

Affiliations

redbiom: a Rapid Sample Discovery and Feature Characterization System

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous