Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;17(1):106-117.
doi: 10.1016/j.gpb.2018.12.005. Epub 2019 Apr 23.

GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation

Affiliations

GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation

Jiarui Li et al. Genomics Proteomics Bioinformatics. 2019 Feb.

Abstract

Identifying antimicrobial resistant (AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing (NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations (CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments (GPA). Simulation results showed that GPA has reduced the false discovery rate (FDR) and mean absolute error (MAE) in CNV and single nucleotide variant (SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.

Keywords: Bayesian model; Genetic polymorphisms; Metagenomics; Next-generation sequencing; Pool-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic overview of the GPA package A. The process of data mapping. B. The CNV calling model: the depth in each position is analyzed (upper panel) and predicted for its ploidy number with a Bayesian model (down panel). The pink and blue lines in the circle indicate duplication and deletion regions, respectively. C. The SNV calling and correction model: the ploidy analysis result and processed BAM file are used as input for SNV calling and identification with GATK UnifiedGenotyper. The allele fraction is corrected using the CNV identification result. The fractions of the alternate and reference alleles are indicated in red and yellow, respectively. The gray bar indicate deletion regions. D. The CNV and SNV calling results are annotated using the reference genes for functional prediction and further population analysis. The orange arrow represents a transcript in the genome. The mutants in the population are drawn with yellow or pink circles. CNV, copy number variation; SNV, single nucleotide variation.
Figure 2
Figure 2
CNV model in the GPA package A. Diagram of how ploidy allele fraction is calculated using an example ploidy (n = 3). Blue indicates deletion, and red indicates duplication. “1″ is a normal stage with a mixture of 3 bacteria. B. The negative binomial distribution fitting the main distribution curve of coverage depth for each nucleotide site. The red line represents the real depth distribution in a simulation using the Pool-seq data from 3 genomes. The black line represents the fitted negative binomial distribution, and the shadow region represents the main peak. C. Read depth modeled by a negative binomial distribution with different allele fractions. See panel A for color codes. D. Prior estimation using iterations. E. An example of GPA used to identify Kpn deletion regions in Pool-seq data. The individual genome shotgun sequencing data for Kpn12 and Kpn14 have a deletion in the genome indicated by blue rectangles. The size of this deletion in the bottom panel represents the estimated posterior for different allele types. F. MAE calculated with different ploidy numbers using GPA (green) and traditional DBA (red). The color gradient of the lines (from light to dark) represents the coverage depth in the model of 10×, 20×, 50×, and 100×, respectively. G. Evaluation of GPA performance in low sequencing depth. MAE was calculated with different coverage depths using GPA (green) and traditional DBA (red). The color gradient of the lines (from light to dark) represents the number of ploidies in the model, which are 3, 5, 10, and 20, respectively. MAE, mean absolute error.
Figure 3
Figure 3
SNV correction model in the GPA package Genomic feature of 10 sequencing reads in a given genomic site. The yellow circles represent sites on this genome that have the same allele as the reference genome “T”, the red circles represent alternative alleles, for example “A”, and the blue circles represent deletion sites. Ref, reference allele; Alt, alternative allele; Del, deletion allele.
Figure 4
Figure 4
Evaluation of the GPA package and comparison with other approaches A. The correlation between MAE and deletion fraction in 10 ploidy states with 50× coverage using different methods. MAE rate was calculated under different deletion conditions. Red represents the traditional DBE method, green represents the GATK method, and blue represents the GPA method. B. The correlation between accuracy and deletion fraction in 10 ploidy states with 50× coverage using different methods. C. The correlation between MAE and sequencing depths. The color gradient of the lines (from light to dark) represents ploidy number of 3, 5, 10, and 20 in the model. D. The correlation between MAE and different ploidy numbers. The color gradient of the lines (from light to dark) represents coverage depth in the model spanning 10×, 20×, 50×, and 100×.
Figure 5
Figure 5
Comparison of Pool-seq data analyzed by GPA with individually analyzed data from 2009 and 2013 datasets A. Identification of CNVs in Pool-seq data of K. pneumoniae in 2009 and 2013 datasets by GPA. The red block represents CNVs found in the 2009 data and the cyan block represents CNVs in the 2013 data. B. Detection of deletions in the antimicrobial resistance gene tolC in K. pneumoniae populations in 2009 and 2013 datasets. The gray line represents the results of DBE analysis; the red and cyan lines represent the results of GPA in 2009 and 2013 populations, respectively.

Similar articles

Cited by

References

    1. Berendonk T.U., Manaia C.M., Merlin C., Fatta-Kassinos D., Cytryn E., Walsh F. Tackling antibiotic resistance: the environmental framework. Nat Rev Microbiol. 2015;13:310–317. - PubMed
    1. Jolley K.A. Multi-locus sequence typing. Methods Mol Med. 2001;67:173–186. - PubMed
    1. Suchkov I., Vodop'ianov A.S., Vodop'ianov S.O., Shishiianu M.V., Mishan'kin B.N. The multi-locus VNTR-analysis in studies of the population structure of Yersinia pestis in natural foci. Mol Gen Mikrobiol Virusol. 2004:19–28. - PubMed
    1. Schwartz D.C., Cantor C.R. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell. 1984;37:67–75. - PubMed
    1. Burnham C.D., Leeds J., Nordmann P., O'Grady J., Patel J. Diagnosing antimicrobial resistance. Nat Rev Microbiol. 2017;15:697–703. - PubMed

Publication types