Building multi-marker algorithms for disease prediction-the role of correlations among markers

Paul F Pinsky¹, Claire S Zhu

Affiliations

PMID: 21918599
PMCID: PMC3169344
DOI: 10.4137/BMI.S7513

Building multi-marker algorithms for disease prediction-the role of correlations among markers

Paul F Pinsky et al. Biomark Insights. 2011.

. 2011:6:83-93.

doi: 10.4137/BMI.S7513. Epub 2011 Aug 14.

Authors

Paul F Pinsky¹, Claire S Zhu

Affiliation

¹ Early Detection Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD 20852, USA.

PMID: 21918599
PMCID: PMC3169344
DOI: 10.4137/BMI.S7513

Abstract

A widely held viewpoint in the field of predictive biomarkers for disease holds that no single marker can provide high enough discrimination and that a panel of markers, combined in some type of algorithm, will be needed. Motivated by a recent study where 27 additional markers for ovarian cancer, many of which had good predictive value alone, failed to substantially increase the predictive ability of the primary marker of CA125, we explore the effect of additional markers on the area under the ROC curve (AUC). We develop a statistical model based on the multivariate normal distribution and linear algorithms and use it to explore how the magnitude and direction of statistical correlation among the markers (in diseased and in non-diseased) is critical in determining the added predictive value of additional markers. We show mathematically and empirically that if the additional marker(s) is negatively correlated with the primary marker, then it will always be able to provide increased AUC when combined with the primary marker (as compared to that obtained with the primary marker alone), even if it has little predictive ability on its own. In contrast, if the additional marker(s) is positively correlated with the primary marker, then it is unlikely to substantially increase the AUC when combined with the primary marker, even when it has good predictive ability on its own. Thus, univariate analyses alone may not be the best approach in choosing which markers to combine in a predictive panel of markers; patterns of statistical correlation should be considered in ranking top-performing biomarkers.

Keywords: ROC AUC; biomarkers; correlation; linear algorithm; multivariate normal distribution.

PubMed Disclaimer

Figures

**Figure 1**
Relationship between correlation and ΔAUC for 2-marker combinations. ΔAUC for 2-marker combination is plotted against AUC of marker 2 alone; each curve represents different values of the correlation of marker 1 with marker 2. Solid black line is 0 correlation in both cases and controls. Two red lines are correlations of −0.5 in cases and controls (solid line) and correlation of −0.5 in cases and 0 in controls (dotted line). Two blue lines are correlations of 0.5 in cases and controls (solid line) and correlation of 0.5 in cases and 0 in controls (dotted line). Direction of regulation is assumed the same for both markers (ie, both up-regulated in cases or both down-regulated in cases).

**Figure 2**
Scatter plots of marker 1 by marker 2 values for cases (blue triangles) and controls (red dots). Individual AUCs of marker 1 and 2 are 0.760 and 0.714, respectively. Correlation (in both cases and controls) is 0.0 in (A), 0.7 in (B) and −0.7 in (C); in (D), correlation is 0 in controls and −0.7 in cases. AUCs for optimal linear combination are 0.817 in (A), 0.762 in (B), 0.950 in (C) and 0.869 in (D). Black line gives propensity score of optimal linear combination by perpendicular projection of points onto line.

**Figure 3**
Histogram of AUCs. Upper panel is histogram of AUCs of 28 individual ovarian markers; lower panel is histogram of AUCs of optimal combination of CA125 and an additional marker.

See this image and copyright information in PMC

References

1. Baker S. Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics. 2000;56:1082–7. - PubMed
1. Pepe MS, Thompson ML. Combining diagnostic tests results to increase accuracy. Biostatistics. 2000;1:123–40. - PubMed
1. Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006;62:221–9. - PubMed
1. Zhu CS, Pinsky PF, Cramer DW. A framework for evaluating biomarkers for early detection: validation of biomarker panels for ovarian cancer. Cancer Prevention Research. 2011;4:375–83. - PMC - PubMed
1. Cramer DW, Bast RC, Berg CE, et al. Ovarian cancer biomarker performance in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial specimens. Cancer Prevention Research. 2011;4:365–74. - PMC - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Building multi-marker algorithms for disease prediction-the role of correlations among markers

Affiliation

Building multi-marker algorithms for disease prediction-the role of correlations among markers

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous