Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 18;14(8):1647.
doi: 10.3390/genes14081647.

StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level

Affiliations

StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level

Sanjit Pandey et al. Genes (Basel). .

Abstract

The emergence of next-generation sequencing (NGS) technology has greatly influenced microbiome research and led to the development of novel bioinformatics tools to deeply analyze metagenomics datasets. Identifying strain-level variations in microbial communities is important to understanding the onset and progression of diseases, host-pathogen interrelationships, and drug resistance, in addition to designing new therapeutic regimens. In this study, we developed a novel tool called StrainIQ (strain identification and quantification) based on a new n-gram-based (series of n number of adjacent nucleotides in the DNA sequence) algorithm for predicting and quantifying strain-level taxa from whole-genome metagenomic sequencing data. We thoroughly evaluated our method using simulated and mock metagenomic datasets and compared its performance with existing methods. On average, it showed 85.8% sensitivity and 78.2% specificity on simulated datasets. It also showed higher specificity and sensitivity using n-gram models built from reduced reference genomes and on models with lower coverage sequencing data. It outperforms alternative approaches in genus- and strain-level prediction and strain abundance estimation. Overall, the results show that StrainIQ achieves high accuracy by implementing customized model-building and is an efficient tool for site-specific microbial community profiling.

Keywords: DSEM; StrainIQ; metagenomics; microbiota; n-grams; site-specific; strain-level.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Graphical summary of StrainIQ algorithm. (A) n-gram quantification for DSEM building based on reference genomes. (B) Taxa identification and (C) relative abundance estimation of taxa from metagenomic data using DSEM. The longer red color lines in the figure indicate the linear genomes of the microbes from the reference genome and the shorter red lines denote the extracted n-grams.
Figure 2
Figure 2
Determining the n-gram score cutoff for the GI tract DSEM. The intersection point between the positive and negative datasets is the optimal cutoff where there will be maximum true positives with minimum false positives.
Figure 3
Figure 3
StrainIQ results for simulated and experimental samples. Sensitivity and specificity plot (values on y-axis) (A) using ten simulated datasets; (B) using even and staggered experimental samples across different reference quality (on x-axis); and (C) using different coverages (on x-axis). (D) Comparison of uniqueness of n-grams across different coverages. The y-axis is the ratio of the number of common n-grams in the group to the number of unique n-grams. # mark in figure indicates ‘number of’.
Figure 4
Figure 4
Comparison of sensitivity and specificity between StrainIQ and KrakenUniq in strain-level identification using (A) complete and incomplete reference genome models and (B) different metagenomic sequencing coverage datasets. Note that the yellow lines shown in both Figure 4A,B represent the sensitivity measures for both StrainIQ and KrakenUniq as those values are identical at 1.
Figure 5
Figure 5
Comparison of relative abundance estimates of sequenced mock communities (with known relative abundances) using StrainIQ and KrakenUniq, in proportion to the number of unique n-grams in (A) even community and (B) staggered community. The line added parallel to the y-axis (relative abundance) on right of the graph represents the number of unique n-grams in each genome in the even (green line) and staggered (orange line) mix samples. # symbol in the figure indicates ‘number of’.

Similar articles

Cited by

References

    1. Reynoso-García J., Miranda-Santiago A.E., Meléndez-Vázquez N.M., Acosta-Pagán K., Sánchez-Rosado M., Díaz-Rivera J., Rosado-Quiñones A.M., Acevedo-Márquez L., Cruz-Roldán L., Tosado-Rodríguez E.L., et al. A complete guide to human microbiomes: Body niches, transmission, development, dysbiosis, and restoration. Front. Syst. Biol. 2022;2:951403. doi: 10.3389/fsysb.2022.951403. - DOI - PMC - PubMed
    1. Qin J., Li R., Raes J., Arumugam M., Burgdorf K.S., Manichanh C., Nielsen T., Pons N., Levenez F., Yamada T., et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. - DOI - PMC - PubMed
    1. Kilian M., Chapple I.L.C., Hannig M., Marsh P.D., Meuric V., Pedersen A.M.L., Tonetti M.S., Wade W.G., Zaura E. The oral microbiome—An update for oral healthcare professionals. Br. Dent. J. 2016;221:657–666. doi: 10.1038/sj.bdj.2016.865. - DOI - PubMed
    1. DeGruttola A.K., Low D., Mizoguchi A., Mizoguchi E. Current Understanding of Dysbiosis in Disease in Human and Animal Models. Inflamm. Bowel Dis. 2016;22:1137–1150. doi: 10.1097/MIB.0000000000000750. - DOI - PMC - PubMed
    1. Yoo J.Y., Groer M., Dutra S.V.O., Sarkar A., McSkimming D.I. Gut Microbiota and Immune System Interactions. Microorganisms. 2020;8:1587. doi: 10.3390/microorganisms8101587. - DOI - PMC - PubMed

Publication types

LinkOut - more resources