Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btac713.
doi: 10.1093/bioinformatics/btac713.

MIDAS2: Metagenomic Intra-species Diversity Analysis System

Affiliations

MIDAS2: Metagenomic Intra-species Diversity Analysis System

Chunyu Zhao et al. Bioinformatics. .

Abstract

Summary: The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number variants in microbial populations. Here, we present MIDAS2, which addresses the computational challenges presented by increasingly large reference genome databases, while adding functionality for building custom databases and leveraging paired-end reads to improve SNV accuracy. This fast and scalable reengineering of the MIDAS pipeline enables thousands of metagenomic samples to be efficiently genotyped.

Availability and implementation: The source code is available at https://github.com/czbiohub/MIDAS2.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Speed, accuracy and application of MIDAS2. (A) The SNV module of MIDAS2 was re-engineered to parallelize within species, making it increasingly faster than MIDAS as we deploy more CPUs. This analysis was performed with 211 metagenomic samples (NCBI accession: PRJNA400072). (B) Metagenotype accuracy was benchmarked using identical aliquots of a standardized microbial community, for which all consensus SNVs are false positives. More errors are made with a large reference genome database compared to one with only the species in the community (MIDASDB v1.2 versus Zymo Genome). Post-alignment filters, including how paired-end reads are handled, differ between tools (run with default filters) and affect false positive rates. Despite a large database (Pangenomes2), metaSNV v2 has a low false positive rate due to using only uniquely aligned reads, but this comes with the cost of lower sensitivity. Supplementary Figure S6 shows how database and post-alignment filters affect errors in population SNVs; MIDAS2 and inStrain have similar error rates with Zymo Genomes. (C) Distribution of samples with evidence of a strain mixture versus one dominant strain for 44 species metagenotyped by MIDAS2 in 1097 samples from the PREDICT cohort (NCBI accession: PRJEB39223)

References

    1. Almeida A. et al. (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol., 39, 105–114. - PMC - PubMed
    1. Beghini F. et al. (2021) Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife, 10, e65088. - PMC - PubMed
    1. Bush S.J. et al. (2020) Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. GigaScience, 9, giaa007. - PMC - PubMed
    1. Garud N.R. et al. (2019) Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol., 17, e3000102. - PMC - PubMed
    1. Nayfach S. et al. (2016) An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res., 26, 1612–1625. - PMC - PubMed

Publication types