Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011:6:83-93.
doi: 10.4137/BMI.S7513. Epub 2011 Aug 14.

Building multi-marker algorithms for disease prediction-the role of correlations among markers

Affiliations

Building multi-marker algorithms for disease prediction-the role of correlations among markers

Paul F Pinsky et al. Biomark Insights. 2011.

Abstract

A widely held viewpoint in the field of predictive biomarkers for disease holds that no single marker can provide high enough discrimination and that a panel of markers, combined in some type of algorithm, will be needed. Motivated by a recent study where 27 additional markers for ovarian cancer, many of which had good predictive value alone, failed to substantially increase the predictive ability of the primary marker of CA125, we explore the effect of additional markers on the area under the ROC curve (AUC). We develop a statistical model based on the multivariate normal distribution and linear algorithms and use it to explore how the magnitude and direction of statistical correlation among the markers (in diseased and in non-diseased) is critical in determining the added predictive value of additional markers. We show mathematically and empirically that if the additional marker(s) is negatively correlated with the primary marker, then it will always be able to provide increased AUC when combined with the primary marker (as compared to that obtained with the primary marker alone), even if it has little predictive ability on its own. In contrast, if the additional marker(s) is positively correlated with the primary marker, then it is unlikely to substantially increase the AUC when combined with the primary marker, even when it has good predictive ability on its own. Thus, univariate analyses alone may not be the best approach in choosing which markers to combine in a predictive panel of markers; patterns of statistical correlation should be considered in ranking top-performing biomarkers.

Keywords: ROC AUC; biomarkers; correlation; linear algorithm; multivariate normal distribution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relationship between correlation and ΔAUC for 2-marker combinations. ΔAUC for 2-marker combination is plotted against AUC of marker 2 alone; each curve represents different values of the correlation of marker 1 with marker 2. Solid black line is 0 correlation in both cases and controls. Two red lines are correlations of −0.5 in cases and controls (solid line) and correlation of −0.5 in cases and 0 in controls (dotted line). Two blue lines are correlations of 0.5 in cases and controls (solid line) and correlation of 0.5 in cases and 0 in controls (dotted line). Direction of regulation is assumed the same for both markers (ie, both up-regulated in cases or both down-regulated in cases).
Figure 2
Figure 2
Scatter plots of marker 1 by marker 2 values for cases (blue triangles) and controls (red dots). Individual AUCs of marker 1 and 2 are 0.760 and 0.714, respectively. Correlation (in both cases and controls) is 0.0 in (A), 0.7 in (B) and −0.7 in (C); in (D), correlation is 0 in controls and −0.7 in cases. AUCs for optimal linear combination are 0.817 in (A), 0.762 in (B), 0.950 in (C) and 0.869 in (D). Black line gives propensity score of optimal linear combination by perpendicular projection of points onto line.
Figure 2
Figure 2
Scatter plots of marker 1 by marker 2 values for cases (blue triangles) and controls (red dots). Individual AUCs of marker 1 and 2 are 0.760 and 0.714, respectively. Correlation (in both cases and controls) is 0.0 in (A), 0.7 in (B) and −0.7 in (C); in (D), correlation is 0 in controls and −0.7 in cases. AUCs for optimal linear combination are 0.817 in (A), 0.762 in (B), 0.950 in (C) and 0.869 in (D). Black line gives propensity score of optimal linear combination by perpendicular projection of points onto line.
Figure 3
Figure 3
Histogram of AUCs. Upper panel is histogram of AUCs of 28 individual ovarian markers; lower panel is histogram of AUCs of optimal combination of CA125 and an additional marker.

References

    1. Baker S. Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics. 2000;56:1082–7. - PubMed
    1. Pepe MS, Thompson ML. Combining diagnostic tests results to increase accuracy. Biostatistics. 2000;1:123–40. - PubMed
    1. Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006;62:221–9. - PubMed
    1. Zhu CS, Pinsky PF, Cramer DW. A framework for evaluating biomarkers for early detection: validation of biomarker panels for ovarian cancer. Cancer Prevention Research. 2011;4:375–83. - PMC - PubMed
    1. Cramer DW, Bast RC, Berg CE, et al. Ovarian cancer biomarker performance in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial specimens. Cancer Prevention Research. 2011;4:365–74. - PMC - PubMed