Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr;10(1):41-58.
doi: 10.1007/s12561-017-9187-y. Epub 2017 Feb 10.

A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Affiliations

A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Yi-Hui Zhou et al. Stat Biosci. 2018 Apr.

Abstract

It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage, and then followed up in a second stage. However, to our knowledge no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM-FDR based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.

Keywords: Biomarker; False discovery rates; Hidden Markov model; Metagenomics; Metatranscriptomics; PCR.

PubMed Disclaimer

Figures

Figure 1 (A)
Figure 1 (A)
Heatmap of the HMP data, consisting of tag counts for 748 Operational Taxonomic Units (OTUs), transformed as loge(count+0.5), with 103 males and 88 females.
Figure 1 (B)
Figure 1 (B)
Heat map of sample correlations of log(count+0.5) between OTUs. Data were from a metagenomic analysis of NIH human microbiome project with 103 males and 88 females, with OTUs ordered by phylogenetic relationships. The correlations are primarily block structured, with less extreme negative correlations than positive correlations. Solid lines indicate family-level boundaries in the ordered taxa.
Figure 2
Figure 2
Average empirical FDR, FNR, and Average Total Positives (ATP) for various a11 at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) for m = 500 at nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the ATP. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).
Figure 3
Figure 3
Average empirical FDR, FNR, and Average Total Positives for various a11 at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) for m = 1000 at nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the average total positives. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).
Figure 4
Figure 4
Average empirical FDR, FNR, and Average Total Positives when the number of components for nonnull is misspecified, for various a11 at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2), with m = 500 and the nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the ATP. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).
Figure 5
Figure 5
Average empirical FDR, FNR, and Average Total Positives when the number of components for nonnull is misspecified, for various a11 at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) , with m = 1000 and the nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the Average Total Positives. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).
Figure 6
Figure 6. Phylum, class, order, and family for the 15 significant genera identified using the two-stage sampling (genera averaged within each family)
Using the full dataset, ratios of (female mean count)/(male mean count) are shown, corrected for a slight (1.1) male:female bias among the utilized 748 genera/taxa. Bold-face indicates microbiome order/families that appeared among significant sex-based genera in Markle et al.[ 23].

References

    1. Zehetmayer S, Bauer P, Posch M. Two-stage designs for experiments with a large number of hypotheses. Bioinformatics. 2005;21:3771–3777. - PubMed
    1. Tickle TL, Segata N, Waldron L, Weingart U, Huttenhower C. Two-stage microbial community experimental design. ISME J. 2013;7:2330–9. - PMC - PubMed
    1. Breslow NE, Cain KC. Logistic regression for two-stage case-control data. Biometrika. 1988;71:11–20.
    1. Haneuse S, Schildcrout J, Gillen D. A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics. 2012;13:274–88. - PMC - PubMed
    1. Goll A, Bauer P. Two-stage designs applying methods differing in costs. Bioinformatics. 2007;23:1519–26. - PubMed

LinkOut - more resources