Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 3;9(1):58.
doi: 10.1186/s40168-021-01015-y.

Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing

Affiliations

Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing

Abigail L Lind et al. Microbiome. .

Abstract

Background: Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded.

Results: Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients. Finally, we observe eukaryotes in Arabidopsis leaf samples, many of which are not identifiable from public protein databases.

Conclusions: EukDetect provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes. We demonstrate that it enables discoveries that would be missed or clouded by false positives with standard shotgun sequence analysis. EukDetect will greatly advance our understanding of how microbial eukaryotes contribute to microbiomes. Video abstract.

Keywords: Algae; Arthropod; Choanoflagellate; Fungi; Helminth; Microbial eukaryotes; Nematode; Protists; Whole metagenome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Human gut microbiome bacterial sequence reads are misattributed to eukaryotes. a Metagenomic sequencing reads were simulated from 971 species total from all major phyla in human stool (2 million reads per species) and aligned to all microbial eukaryotic genomes used to develop EukDetect. Even after stringent filtering (see “Methods”), many species have thousands of reads aligning to eukaryotic genomes, which would lead to false detection of eukaryotes in samples with only bacteria. b Amount of eukaryotic genome sequence aligned by simulated bacterial reads in 1367 eukaryotic genomes. c Taxonomic distribution of eukaryotes in whole-genome database. Dark blue indicates eukaryotic genomes where bacterial reads aligned
Fig. 2
Fig. 2
The EukDetect database comprises marker genes from protists, fungi, archaeplastida species, and metazoans. a Taxonomic distribution of the species with transcriptomes and genomes included in the EukDetect database. b Total number of marker genes identified per species by taxonomic group
Fig. 3
Fig. 3
EukDetect is sensitive and accurate for yeasts, protists, and worms at low sequence coverage. a Number of marker genes with at least one aligned read per species up to 1x genome coverage. Horizontal red line indicates the total number of marker genes per species (i.e., the best possible performance). b Number of marker genes with at least one aligned read per species up to 0.05x genome coverage. Vertical red line indicates a detection cutoff where 4 or more reads align to 2 or more markers in 8 out of 10 simulations or more. c The number of species reported by EukDetect for two closely related Entamoeba species before and after minimum read count and off-target alignment filtering. One species is the correct result. Simulated genome coverages are the same as in panel d. d EukDetect performance on simulated sequencing data from mixtures of two closely related Entamoeba species at different genome coverages. Dot colors indicate which species were detected in 8 out of 10 simulations or more. Dashed line indicates the lowest detectable coverage possible by EukDetect when run on each species individually. Axes not to scale
Fig. 4
Fig. 4
Distribution of eukaryotic species in the gastrointestinal tract taken from biopsies. Eukaryotes were detected at all sites in the large intestine and in the terminal ileum, in both lumen and mucosal samples. One biopsy of gastric antrum mucosa in the stomach contained a Malassezia yeast. Slashes indicate no eukaryotes detected in any samples from that site. See Figures S3 and S4 for locations of Blastocystis subtypes and locations of fungi
Fig. 5
Fig. 5
Changes in eukaryotic gut microbes during the first years of life. a Age at collection in the DIABIMMUNE three-country cohort for samples with no eukaryote or with any of the four most frequently observed eukaryotic families. b The mean age at collection of samples from individuals with no observed eukaryotes compared to the mean age at collection of individuals where one of three eukaryotic families were detected. Individuals where more than one eukaryotic family was detected are excluded. Malasseziaceae is excluded due to low sample size. Group comparisons were performed with an unpaired Wilcoxon rank-sum test. *p < 0.05; **p < 0.01; ***p < 0.001. c Model of eukaryotic succession in the first years of life. Debaryomycetaceae species predominate during the first 2 years of life, but are also detected later. Blastocystidae species and Saccharomycetaceae species predominate after the first 2 years of life, though they are detected as early as the second year of life. Malasseziaceae species do not change over time
Fig. 6
Fig. 6
Detection of eukaryotes from paired DNA- and RNA-sequenced samples from the IHMP IBD cohort. Plots depict the most commonly detected eukaryotic families, and whether a given family was detected in the DNA sequencing alone, the RNA sequencing alone, or from both the RNA and the DNA sequencing from a sample. Some samples shown here come from the same individual sampled at different time points
Fig. 7
Fig. 7
Eukaryotes in the Baltic Sea differ across environments and salinity gradients. a Counts of observed eukaryotic groups detected in the Baltic Sea across different environment. LMO samples were obtained at the Linnaeus Microbial Observatory near Öland, Sweden. Transect samples were obtained from 9 different geographic locations across the Baltic Sea. Redox samples were obtained from the redoxcline. Dark-colored bars indicate the number of samples from a given environment that contain at least one species belonging to the eukaryotic subgroup. Light-colored bars indicate the number of samples that do not contain a given eukaryotic subgroup. b The mean salinity of samples where eukaryotic species were detected. Bars indicate 1 standard deviation. Only species detected in 3 or more samples were included

References

    1. Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK. Sequencing our way towards understanding global eukaryotic biodiversity. Trends Ecol Evol. 2012;27:233–243. doi: 10.1016/j.tree.2011.11.010. - DOI - PMC - PubMed
    1. Rodriguez RJ, White JF, Jr, Arnold AE, Redman RS. Fungal endophytes: diversity and functional roles. New Phytol. 2009;182:314–330. doi: 10.1111/j.1469-8137.2009.02773.x. - DOI - PubMed
    1. Akin DE, Borneman WS. Role of rumen fungi in fiber degradation. J Dairy Sci. 1990;73:3023–3032. doi: 10.3168/jds.S0022-0302(90)78989-8. - DOI - PubMed
    1. Kamoun S, Furzer O, Jones JDG, Judelson HS, Ali GS, Dalio RJD, et al. The top 10 oomycete pathogens in molecular plant pathology. Mol Plant Pathol. 2015;16:413–434. doi: 10.1111/mpp.12190. - DOI - PMC - PubMed
    1. Haque R. Human intestinal parasites. J Health Popul Nutr. 2007;25:387–391. - PMC - PubMed

Publication types

LinkOut - more resources