StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level
- PMID: 37628698
- PMCID: PMC10454763
- DOI: 10.3390/genes14081647
StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level
Abstract
The emergence of next-generation sequencing (NGS) technology has greatly influenced microbiome research and led to the development of novel bioinformatics tools to deeply analyze metagenomics datasets. Identifying strain-level variations in microbial communities is important to understanding the onset and progression of diseases, host-pathogen interrelationships, and drug resistance, in addition to designing new therapeutic regimens. In this study, we developed a novel tool called StrainIQ (strain identification and quantification) based on a new n-gram-based (series of n number of adjacent nucleotides in the DNA sequence) algorithm for predicting and quantifying strain-level taxa from whole-genome metagenomic sequencing data. We thoroughly evaluated our method using simulated and mock metagenomic datasets and compared its performance with existing methods. On average, it showed 85.8% sensitivity and 78.2% specificity on simulated datasets. It also showed higher specificity and sensitivity using n-gram models built from reduced reference genomes and on models with lower coverage sequencing data. It outperforms alternative approaches in genus- and strain-level prediction and strain abundance estimation. Overall, the results show that StrainIQ achieves high accuracy by implementing customized model-building and is an efficient tool for site-specific microbial community profiling.
Keywords: DSEM; StrainIQ; metagenomics; microbiota; n-grams; site-specific; strain-level.
Conflict of interest statement
The authors declare no conflict of interest.
Figures





Similar articles
-
MetaID: a novel method for identification and quantification of metagenomic samples.BMC Genomics. 2013;14 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2164-14-S8-S4. Epub 2013 Dec 9. BMC Genomics. 2013. PMID: 24564518 Free PMC article.
-
CAIM: coverage-based analysis for identification of microbiome.Brief Bioinform. 2024 Jul 25;25(5):bbae424. doi: 10.1093/bib/bbae424. Brief Bioinform. 2024. PMID: 39222062 Free PMC article.
-
AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides.BMC Bioinformatics. 2024 Jul 16;25(1):241. doi: 10.1186/s12859-024-05859-7. BMC Bioinformatics. 2024. PMID: 39014300 Free PMC article.
-
Practical considerations for sampling and data analysis in contemporary metagenomics-based environmental studies.J Microbiol Methods. 2018 Nov;154:14-18. doi: 10.1016/j.mimet.2018.09.020. Epub 2018 Oct 1. J Microbiol Methods. 2018. PMID: 30287354 Review.
-
Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025. Brief Bioinform. 2020. PMID: 30860572 Free PMC article. Review.
Cited by
-
DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences.Heliyon. 2024 Aug 24;10(17):e36914. doi: 10.1016/j.heliyon.2024.e36914. eCollection 2024 Sep 15. Heliyon. 2024. PMID: 39281454 Free PMC article.
-
Bioinformatic approaches to blood and tissue microbiome analyses: challenges and perspectives.Brief Bioinform. 2025 Mar 4;26(2):bbaf176. doi: 10.1093/bib/bbaf176. Brief Bioinform. 2025. PMID: 40269515 Free PMC article. Review.
References
-
- Reynoso-García J., Miranda-Santiago A.E., Meléndez-Vázquez N.M., Acosta-Pagán K., Sánchez-Rosado M., Díaz-Rivera J., Rosado-Quiñones A.M., Acevedo-Márquez L., Cruz-Roldán L., Tosado-Rodríguez E.L., et al. A complete guide to human microbiomes: Body niches, transmission, development, dysbiosis, and restoration. Front. Syst. Biol. 2022;2:951403. doi: 10.3389/fsysb.2022.951403. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases