Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 28:9:570.
doi: 10.3389/fmicb.2018.00570. eCollection 2018.

SIPSim: A Modeling Toolkit to Predict Accuracy and Aid Design of DNA-SIP Experiments

Affiliations

SIPSim: A Modeling Toolkit to Predict Accuracy and Aid Design of DNA-SIP Experiments

Nicholas D Youngblut et al. Front Microbiol. .

Abstract

DNA Stable isotope probing (DNA-SIP) is a powerful method that links identity to function within microbial communities. The combination of DNA-SIP with multiplexed high throughput DNA sequencing enables simultaneous mapping of in situ assimilation dynamics for thousands of microbial taxonomic units. Hence, high throughput sequencing enabled SIP has enormous potential to reveal patterns of carbon and nitrogen exchange within microbial food webs. There are several different methods for analyzing DNA-SIP data and despite the power of SIP experiments, it remains difficult to comprehensively evaluate method accuracy across a wide range of experimental parameters. We have developed a toolset (SIPSim) that simulates DNA-SIP data, and we use this toolset to systematically evaluate different methods for analyzing DNA-SIP data. Specifically, we employ SIPSim to evaluate the effects that key experimental parameters (e.g., level of isotopic enrichment, number of labeled taxa, relative abundance of labeled taxa, community richness, community evenness, and beta-diversity) have on the specificity, sensitivity, and balanced accuracy (defined as the product of specificity and sensitivity) of DNA-SIP analyses. Furthermore, SIPSim can predict analytical accuracy and power as a function of experimental design and community characteristics, and thus should be of great use in the design and interpretation of DNA-SIP experiments.

Keywords: DNA-SIP; SIP; SIPSim; community; function; method; microbial.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The SIPSim simulation workflow involves three major stages, which are broken down into multiple steps. Stage 1 involves generating a buoyant density distribution of gDNA fragments for each genome. Stage 2 involves simulating the isopycnic gradients for a particular experimental design. Stage 3 involves generating a DNA-SIP dataset based on the fragment BD value distributions simulated in Stage 1 along with the isopycnic gradient data generated in Stage 2. The output is a table (“DNA-SIP dataset”) of taxon relative abundances in each gradient fraction in each gradient. See section Methods for a more detailed description of the simulation workflow.
Figure 2
Figure 2
SIPSim output provides data that approximates results obtained from DNA-SIP experiments. The CsCl gradient BD distributions of diverse amplicon fragments (n = 1,147 taxa) are depicted such that the distribution of each taxon is represented by a different color. All taxa in the control had 0% atom excess 13C, while 10% of taxa in the treatment were randomly assigned 100% atom excess 13C. Most unlabeled amplicon fragments occur within the range of 1.69–1.72 g ml−1, while 13C-labeled taxa are shifted into higher BD fractions (pre-sequencing, top panels). During the process of high-throughput DNA sequencing amplicon fragments are randomly sampled from each fraction, and this sampling effect alters the shape of the fragment distributions observed in DNA-SIP experiments (post-sequencing, middle panel) relative to the actual distribution of DNA in the gradient (top panels). Typically, data from DNA-SIP experiments are transformed into relative abundance values (post-sequencing, bottom panel) prior to analysis. Identification of taxa that have incorporated isotope requires comparison of amplicon fragment relative abundance distributions in treatment relative to control gradients. The dashed vertical line is provided as a point of reference and designates the theoretical buoyant density of an unlabeled DNA fragment with 50% G + C (as modeled in Equation 1).
Figure 3
Figure 3
Empirical DNA-SIP data shows that unlabeled DNA is found widely within the gradient and that changes in beta-diversity can alter the composition of “heavy” fractions in the absence of isotopically labeled substrates. The DNA is from soil communities incubated for 1, 3, 6, 14, 30, or 48 days following the addition of an unlabeled nutrient mixture. SSU rRNA genes were amplified and sequenced from approximately 24 fractions from each gradient, these amplicons were used to identify the BD variance of amplicon fragments derived from discrete OTUs. The DNA concentration of each gradient fraction was measured using Picogreen assay (A). These values are normalized to the maximum concentration within each gradient. The amplicon diversity within each gradient fraction was measured using the Shannon Index, showing that the diversity of heavy fractions differs between samples even in the absence of isotopic labeling (B). The correlograms (C) reveal autocorrelation (measured with Mantel tests) between taxonomic similarity and fraction BD within each gradient. The variance in OTU BD is positively correlated with OTU pre-fractionation relative abundance, with highly abundant OTUs found throughout the CsCl gradient (D). To improve clarity, single OTUs in (D) were binned into hexagons, with darker shading indicating more OTUs.
Figure 4
Figure 4
SIPSim predicts that DNA-SIP methods vary in accuracy depending on the 13C atom % excess of DNA and the number of taxa that incorporate isotope. Points and bars represent means and standard deviations, respectively (n = 10 simulations). Specificity indicates the fraction of true negatives that are identified correctly. Sensitivity indicates the fraction of labeled taxa (true positives) identified correctly. Balanced accuracy is the product of specificity and sensitivity. The x-axis indicates the amount of 13C isotope present in taxa that are labeled, and different colors are used to indicate the percentage of taxa that have incorporated 13C as indicated by the legend. Heavy-SIP identifies incorporators solely based on OTU presence in “heavy” gradient fractions, and this approach is shown because it has better balanced accuracy than any of the other Heavy-SIP approaches that were analyzed (see Figure S6).
Figure 5
Figure 5
SIPSim predicts that DNA-SIP methods vary in accuracy depending on the number of sequences analyzed per gradient fraction. Points and bars represent means and standard deviations, respectively (n = 10 simulations). Specificity indicates the fraction of true negatives that are identified correctly. Sensitivity indicates the fraction of labeled taxa (true positives) identified correctly. Balanced accuracy is the product of specificity and sensitivity. The x-axis indicates the amount of 13C isotope present in taxa that are labeled, and different colors are used to indicate the average number of sequences determined per gradient fraction as described by the legend.
Figure 6
Figure 6
SIPSim predicts that the sensitivity of detecting OTUs that have low relative abundance and low atom % 13C enrichment increases with added sequencing effort per gradient fraction. Each panel provides results from simulations conducted at different levels of atom % 13C enrichment as indicated (0, 15, 25, 50, 75, and 100 atom % 13C), incorporators were identified using MW-HR-SIP, and sensitivity was assessed by binning OTUs into 10 different abundance classes. Sensitivity indicates the fraction of labeled taxa (true positives) identified correctly. The x-axis indicates the mean relative abundance of the taxa being evaluated, and different colors are used to indicate the average number of sequences determined per gradient fraction as described by the legend. Points and bars represent means and standard deviations, respectively (n = 10 simulations).
Figure 7
Figure 7
SIPSim predicts that DNA-SIP methods differ in their sensitivity to community dissimilarity between replicate samples. Beta diversity, expressed as Bray-Curtis dissimilarity, was varied between simulated replicates (3 replicates each for 12C-control and 13C-treatment gradients) to determine the effect that community dissimilarity between replicates has on method accuracy. Variation in beta diversity was simulated by systematically varying two parameters: the percent of taxa shared between replicate samples (80, 85, 90, 95, or 100%) and the percent of taxa whose rank abundances that were permuted (0, 5, 10, 15, or 20%), with 10 simulation replicates for each parameter set. The blue lines are LOESS curves fit to accuracy values for all simulations (n = 250), and the gray regions represent 99% confidence intervals. For all simulations, 10% of the community were incorporators (50% atom excess 13C).
Figure 8
Figure 8
SIPSim predicts that ΔBD and qSIP vary in their accuracy at estimating 13C atom % excess of labeled DNA fragments. The accuracy of both methods declines as the amount of 13C in DNA increases (A), but accuracy is not affected by the percent of taxa that are labeled; values indicate the mean and standard deviation (n = 10 simulations). Probability density plots indicate that estimates of 13C atom % excess made using ΔBD have greater variance than those made using qSIP, but both estimates systematically underestimate levels of isotope incorporation (B). Each vertical pair of panels indicates the probability density for estimates made across different levels of isotope incorporation (15, 25, 50, and 100 atom % excess), and the dashed line indicates the actual level of isotopic enrichment. For the calculation of probability density, 10% of taxa were labeled using the level of enrichment indicated in each panel.

References

    1. Andeer P., Strand S. E., Stahl D. A. (2012). High-sensitivity stable-isotope probing by a quantitative terminal restriction fragment length polymorphism protocol. Appl. Environ. Microbiol. 78, 163–169. 10.1128/AEM.05973-11 - DOI - PMC - PubMed
    1. Benson D. A., Karsch-Mizrachi I., Lipman D. J., Ostell J., Wheeler D. L. (2008). GenBank. Nucleic Acids Res. 36, D25–D30. 10.1093/nar/gkp1024 - DOI - PMC - PubMed
    1. Birnie G. D., Rickwood D. (1978). Centrifugal Separations in Molecular and Cell Biology. Boston, MA: Butterworths.
    1. Buckley D. H., Huangyutitham V., Hsu S.-F., Nelson T. A. (2007). Stable isotope probing with 15N achieved by disentangling the effects of genome G+C content and isotope enrichment on DNA density. Appl. Environ. Microbiol. 73, 3189–3195. 10.1128/AEM.02609-06 - DOI - PMC - PubMed
    1. Clay O., Douady C. J., Carels N., Hughes S., Bucciarelli G., Bernardi G. (2003). Using analytical ultracentrifugation to study compositional variation in vertebrate genomes. Eur. Biophys. J. 32, 418–426. 10.1007/s00249-003-0294-y - DOI - PubMed

LinkOut - more resources