. 2022 Jan 31;17(1):e0263070.

doi: 10.1371/journal.pone.0263070. eCollection 2022.

A robust biostatistical method leverages informative but uncertainly determined qPCR data for biomarker detection, early diagnosis, and treatment

Wei Zhuang¹, Luísa Camacho², Camila S Silva², Michael Thomson³, Kevin Snyder³

Affiliations

¹ Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America.
² Division of Biochemical Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America.
³ Office of New Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, United States of America.

PMID: 35100319
PMCID: PMC8803186
DOI: 10.1371/journal.pone.0263070

A robust biostatistical method leverages informative but uncertainly determined qPCR data for biomarker detection, early diagnosis, and treatment

Wei Zhuang et al. PLoS One. 2022.

. 2022 Jan 31;17(1):e0263070.

doi: 10.1371/journal.pone.0263070. eCollection 2022.

Authors

Wei Zhuang¹, Luísa Camacho², Camila S Silva², Michael Thomson³, Kevin Snyder³

Affiliations

¹ Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America.
² Division of Biochemical Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America.
³ Office of New Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, United States of America.

PMID: 35100319
PMCID: PMC8803186
DOI: 10.1371/journal.pone.0263070

Abstract

As a common medium-throughput technique, qPCR (quantitative real-time polymerase chain reaction) is widely used to measure levels of nucleic acids. In addition to accurate and complete data, experimenters have unavoidably observed some incomplete and uncertainly determined qPCR data because of intrinsically low overall amounts of biological materials, such as nucleic acids present in biofluids. When there are samples with uncertainly determined qPCR data, some investigators apply the statistical complete-case method by excluding the subset of samples with uncertainly determined data from analysis (CO), while others simply choose not to analyze (CNA) these datasets altogether. To include as many observations as possible in analysis for interesting differential changes between groups, some investigators set incomplete observations equal to the maximum quality qPCR cycle (MC), such as 32 and 40. Although straightforward, these methods may decrease the sample size, skew the data distribution, and compromise statistical power and research reproducibility across replicate qPCR studies. To overcome the shortcomings of the existing, commonly-used qPCR data analysis methods and to join the efforts in advancing statistical analysis in rigorous preclinical research, we propose a robust nonparametric statistical cycle-to-threshold method (CTOT) to analyze incomplete qPCR data for two-group comparisons. CTOT incorporates important characteristics of qPCR data and time-to-event statistical methodology, resulting in a novel analytical method for qPCR data that is built around good quality data from all subjects, certainly determined or not. Considering the benchmark full data (BFD), we compared the abilities of CTOT, CO, MC, and CNA statistical methods to detect interesting differential changes between groups with informative but uncertainly determined qPCR data. Our simulations and applications show that CTOT improves the power of detecting and confirming differential changes in many situations over the three commonly used methods without excess type I errors. The robust nonparametric statistical method of CTOT helps leverage qPCR technology and increase the power to detect differential changes that may assist decision making with respect to biomarker detection and early diagnosis, with the goal of improving the management of patient healthcare.

PubMed Disclaimer

Conflict of interest statement

This work was conducted with the internal funding, NCTR protocol E0772101, of the U.S. Food and Drug Administration, a U.S. government agency. The authors have declared that no competing interests exist. The views presented in this article do not necessarily reflect those of the U.S. Food and Drug Administration. Any mention of commercial products is for clarification and is not intended as an endorsement. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

**Fig 1. Amplification curves of qPCR reactions.**
In the example, five molecular targets reached the threshold of 0.5 before the 32^nd cycle, i.e., Cq < 32, while two reached the threshold between the 32^nd and the 40^th cycle. The eighth molecular target did not reach the threshold by the 40^th cycle, i.e., Cq > 40. QuantStudio Real-Time PCR software version 1.3 (Applied Biosystems by Thermo Fisher Scientific) was used to create the figure.

**Fig 2. The flowchart to perform CTOT with the R coin package.**
Y_(ijk) denotes the Cq value reported by a qPCR assay for molecular target j (j = 1, 2, …, or g) of sample i (i = 1, 2, …, or n) in group k (k = 1 or 2). ΔY_(ijk) denotes normalized Cq for target j of sample i in group k.

**Fig 3. Boxplots of simulated Cq data.**
The points above the solid line would be uncertainly measured by qPCR should 40 be the cutoff for data quality control or for biological, clinical, or technical concerns in practice. The points above the dash line would be uncertainly measured by qPCR should 32 be the cutoff for quality control or for biological, clinical, or technical concerns in practice.

**Fig 4. The empirical power of the CTOT, MC, and CO methods compared with that of BFD.**
BFD stands for the benchmark with full data analyzed with the current standard method, which includes t-tests for two-group comparisons. CTOT stands for the cycle-to-threshold method, while CO denotes the complete-observation method and MC denotes the method that sets uncertain and incomplete observations equal to the assay-specific maximum cycle threshold C₁. Uncertain qPCR data may occur in one or both groups under comparison. % denotes the percentage of uncertainty that is observed in only one group among the replicates. n_rep denotes the number of the replicates with at least one uncertain observation. β₀ and β₁ are parameters of the underlying models. |β₁| is the absolute value of effect size. Panels A, B, and C represent the empirical power of the log-normal, Weibull, and log-logistic simulation type, respectively.

**Fig 5. An Example to Illustrate the Issue of Potential False Negatives of MC and CO.**
(A) The original Cq data simulated with a normal distribution (corresponding to the log-normal simulation type in Table 2, β₀ = 13.35 and β₁ = 2.06; the corresponding empirical power of BFD is 0.80). (B) The normalized Cq data with the BFD, CTOT, MC, or CO methods applied. The filled diamonds denote the Cq data with BFD. BFD stands for the benchmark with full data analyzed with the current standard method, which include t-tests for two-group comparisons. The filled triangles denote the Cq data with CTOT, the cycle-to-threshold method. The vertical green arrows indicate the ranges uncertain observations belong to, e.g., being greater than or equal to the assay-specific maximum cycle threshold C₁. The filled squares denote the Cq data with MC, the method that sets uncertain and incomplete observations equal to C₁. The maximum quality cycle threshold C₁ = 40 is highlighted with a horizontal solid line. The open circles denote the Cq data with CO, the complete-observation method. The first five simulated samples belong to Group 1. The second five simulated samples belong to Group 2. The vertical dash line separates Groups 1 and 2.

**Fig 6. An example to illustrate differences of the MC, CO, and CTOT methods.**
(A) The original Cq data simulated with a normal distribution (corresponding to the log-normal simulation type in Table 5, β₀ = 8.47 and β₁ = 4.65; the corresponding empirical power of BFD is 0.91). (B) The normalized Cq data with the BFD, CTOT, MC, or CO methods applied. The filled diamonds denote the Cq data with BFD. BFD stands for the benchmark with full data analyzed with the current standard method, which include t-tests for two-group comparisons. The filled triangles denote the Cq data with CTOT, the cycle-to-threshold method. The vertical green arrows indicate the ranges uncertain observations belong to, e.g., being greater than or equal to the assay-specific maximum cycle threshold C₁. The filled squares denote the Cq data with MC, the method that sets uncertain and incomplete observations equal to the assay-specific maximum cycle threshold C₁. The maximum quality cycle threshold C₁ = 40 is highlighted with a horizontal solid line. The open circle denoted CO, the complete-observation method. The first five simulated samples belong to Group 1. The second five simulated samples belong to Group 2. The vertical dash line separates Groups 1 and 2.

**Fig 7. Empirical type I error rates of CTOT, BFD, CO, and MC methods.**
CTOT stands for the cycle-to-threshold method. BFD stands for the benchmark with full data analyzed with the current standard method, which includes t-tests for two-group comparisons. CO denotes the complete-observation method and MC denotes the method that sets uncertain observations equal to the assay-specific maximum cycle threshold C_1. In the simulation, C₁ is set to be 40.ΔCq followed normal distributions and e^ΔCq followed log-normal distributions. Parameter Set 1: β₀ = 5, σ = 1; Parameter Set 2: β₀ = 10, σ = 1; +Parameter Set 3: β₀ = 5, σ = 2; and Parameter Set 4: β₀ = 10, σ = 2 with the parameterization listed for log-normal distribution in Table 2.

**Fig 8. Comparison of the statistical significance between t-tests with C₁ = 40 and CTOT with C₁ = 32.**
The sensitivity analysis was performed on 17 two-group comparisons on rat serum microRNAs miR-210-3p and miR-128-3p, where there was at least one uncertain observation in either of the two groups of comparison [3]. The p-values based on t-tests and CTOT (cycle-to-threshold method) are plotted in a -log₁₀ scale on x-axis and y-axis, respectively. C₁ denotes an assay-specific maximum cycle threshold for clinical, quality, or biological relevance, e.g., the cycle number that corresponds to LLOQ (lower limit of quantification). The solid lines are set at p-value = 0.05 and the dashed lines are set at p-value = 0.005. The inset Venn diagram illustrates statistically significant differences of levels of circulating microRNAs between control and treated groups, applying the CTOT, MC, or CO method and a maximum quality cycle threshold of C₁ = 32 to the data reported by Silva et al. [3].

See this image and copyright information in PMC

Cited by

The MCTOT app: A publicly available tool for statistical cycle-to-threshold analysis and inference of informative but uncertainly determined qPCR data.
Zhuang W, Liu J. Zhuang W, et al. PLoS One. 2025 Sep 2;20(9):e0330729. doi: 10.1371/journal.pone.0330729. eCollection 2025. PLoS One. 2025. PMID: 40892763 Free PMC article.

References

1. Harrington PR, Zeng W, Naeger LK. Clinical relevance of detectable but not quantifiable hepatitis C virus RNA during boceprevir or telaprevir treatment. Hepatology. 2012;55(4):1048–57. doi: 10.1002/hep.24791 PubMed PMID: WOS:000302069900008. - DOI - PubMed
1. Sun Y, Liu YX, Cogdell D, Calin GA, Sun BC, Kopetz S, et al.. Examining plasma microRNA markers for colorectal cancer at different stages. Oncotarget. 2016;7(10):11434–49. doi: 10.18632/oncotarget.7196 PubMed PMID: WOS:000375678300054. - DOI - PMC - PubMed
1. Silva CS, Chang CW, Williams D, Porter-Gill P, da Costa GG, Camacho L. Effects of a 28-day dietary co-exposure to melamine and cyanuric acid on the levels of serum microRNAs in male and female Fisher 344 rats. Food Chem Toxicol. 2016;98:11–6. doi: 10.1016/j.fct.2016.09.013 PubMed PMID: WOS:000388054000003. - DOI - PMC - PubMed
1. Anfossi S, Babayan A, Pantel K, Calin GA. Clinical utility of circulating non-coding RNAs—an update. Nature Reviews Clinical Oncology. 2018;15(9):541–63. doi: 10.1038/s41571-018-0035-x PubMed PMID: WOS:000442252300009. - DOI - PubMed
1. De Rubis G, Krishnan SR, Bebawy M. Liquid biopsies in cancer diagnosis, monitoring, and prognosis. Trends Pharmacol Sci. 2019;40(3):172–86. doi: 10.1016/j.tips.2019.01.006 PubMed PMID: WOS:000459246400004. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A robust biostatistical method leverages informative but uncertainly determined qPCR data for biomarker detection, early diagnosis, and treatment

Affiliations

A robust biostatistical method leverages informative but uncertainly determined qPCR data for biomarker detection, early diagnosis, and treatment

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous