Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:7:29-40.
doi: 10.4137/cin.s911. Epub 2008 Dec 23.

Quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation

Affiliations

Quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation

John W Emerson et al. Cancer Inform. 2009.

Abstract

Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.

Keywords: biomarker; breast cancer; immunohistochemistry; multiple imputation; variable selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Marker Selection.
Figure 2
Figure 2
Out-of-sample validation.
Figure 3
Figure 3. Validation of four-marker model
Each plot depicts the distribution of improvements in the goodness-of-fit statistics for three candidate models compared to the baseline model containing only the clinical factors: “Nodes” (lymph node status and clinical factors); “Markers” (four selected protein markers and clinical factors), and “Combined” (including clinical factors, protein markers, and nodal status).

Similar articles

Cited by

References

    1. Arnes JB, Brunet JS, Stefansson I, et al. Placental Cadherin and the Basal Epithelial Phenotype of BRCA1-Related Breast Cancer. . Clin. Cancer Res. 2005;11:4003–11. - PubMed
    1. Camp RL, Chung GG, Rimm DL. Nov. Automated subcellular localization and quantification of protein expression in tissue microarrays. . Nat. Med. 2002;8(11):1323–7. - PubMed
    1. Camp RL, Dolled-Filhart M, King BL, Rimm DL. Quantitative analysis of breast cancer tissue microarrays shows that both high and normal levels of HER2 expression are associated with poor outcome. . Cancer Res. 2003;63:1445–8. - PubMed
    1. Chung GG, Zerkowski MP, Ocal IT, et al. Beta-Catenin and p53 analyses of a breast carcinoma tissue microarray. Cancer. 2004;100:2084–92. - PubMed
    1. Dempster AP, Laird NM, Rubin DB. Maximum likelihood estimation from incomplete data via the EM algorithm. . Journal of the Royal Statistical Society Series B. 1977;39:1–38.

LinkOut - more resources