A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Yi-Hui Zhou^#¹, Paul Brooks², Xiaoshan Wang^#³

Affiliations

¹ Department of Biological Sciences, Bioinformatics Research Center, North Carolina State University, North Carolina, United States of America.
² Department of Statistical Sciences and Operations Research and Department of Supply Chain Management and Analytics, Virginia Commonwealth University, Virginia, United States of America.
³ IMEDACS, LLC, United States of America, xsswang@gmail.com.

^# Contributed equally.

PMID: 30174757
PMCID: PMC6116560
DOI: 10.1007/s12561-017-9187-y

A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Yi-Hui Zhou et al. Stat Biosci. 2018 Apr.

. 2018 Apr;10(1):41-58.

doi: 10.1007/s12561-017-9187-y. Epub 2017 Feb 10.

Authors

Yi-Hui Zhou^#¹, Paul Brooks², Xiaoshan Wang^#³

Affiliations

¹ Department of Biological Sciences, Bioinformatics Research Center, North Carolina State University, North Carolina, United States of America.
² Department of Statistical Sciences and Operations Research and Department of Supply Chain Management and Analytics, Virginia Commonwealth University, Virginia, United States of America.
³ IMEDACS, LLC, United States of America, xsswang@gmail.com.

^# Contributed equally.

PMID: 30174757
PMCID: PMC6116560
DOI: 10.1007/s12561-017-9187-y

Abstract

It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage, and then followed up in a second stage. However, to our knowledge no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM-FDR based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.

Keywords: Biomarker; False discovery rates; Hidden Markov model; Metagenomics; Metatranscriptomics; PCR.

PubMed Disclaimer

Figures

**Figure 1 (A)**
Heatmap of the HMP data, consisting of tag counts for 748 Operational Taxonomic Units (OTUs), transformed as log_e(count+0.5), with 103 males and 88 females.

**Figure 1 (B)**
Heat map of sample correlations of log(count+0.5) between OTUs. Data were from a metagenomic analysis of NIH human microbiome project with 103 males and 88 females, with OTUs ordered by phylogenetic relationships. The correlations are primarily block structured, with less extreme negative correlations than positive correlations. Solid lines indicate family-level boundaries in the ordered taxa.

**Figure 2**
Average empirical FDR, FNR, and Average Total Positives (ATP) for various a₁₁ at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) for m = 500 at nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the ATP. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).

**Figure 3**
Average empirical FDR, FNR, and Average Total Positives for various a₁₁ at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) for m = 1000 at nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the average total positives. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).

**Figure 4**
Average empirical FDR, FNR, and Average Total Positives when the number of components for nonnull is misspecified, for various a₁₁ at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2), with m = 500 and the nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the ATP. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).

**Figure 5**
Average empirical FDR, FNR, and Average Total Positives when the number of components for nonnull is misspecified, for various a₁₁ at fixed δ = 1.5 (Row 1) and as a function of effect size δ (Row 2) , with m = 1000 and the nominal FDR of 0.05. Column 1 compares the empirical FDR. Column 2 compares the empirical FNR. Column 3 compares the Average Total Positives. Methods include mHMM (○), Z approach (Δ), Fisher’s combination (×), full data HMM (◇), and one-stage Benjamini-Hochberg procedure (+).

**Figure 6. Phylum, class, order, and family for the 15 significant genera identified using the two-stage sampling (genera averaged within each family)**
Using the full dataset, ratios of (female mean count)/(male mean count) are shown, corrected for a slight (1.1) male:female bias among the utilized 748 genera/taxa. Bold-face indicates microbiome order/families that appeared among significant sex-based genera in Markle et al.[ 23].

See this image and copyright information in PMC

References

1. Zehetmayer S, Bauer P, Posch M. Two-stage designs for experiments with a large number of hypotheses. Bioinformatics. 2005;21:3771–3777. - PubMed
1. Tickle TL, Segata N, Waldron L, Weingart U, Huttenhower C. Two-stage microbial community experimental design. ISME J. 2013;7:2330–9. - PMC - PubMed
1. Breslow NE, Cain KC. Logistic regression for two-stage case-control data. Biometrika. 1988;71:11–20.
1. Haneuse S, Schildcrout J, Gillen D. A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics. 2012;13:274–88. - PMC - PubMed
1. Goll A, Bauer P. Two-stage designs applying methods differing in costs. Bioinformatics. 2007;23:1519–26. - PubMed

Grants and funding

R21 HG007840/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Affiliations

A two-stage hidden Markov model design for biomarker detection, with application to microbiome research

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources