Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 28:9:164.
doi: 10.1186/1471-2407-9-164.

Do serum biomarkers really measure breast cancer?

Affiliations

Do serum biomarkers really measure breast cancer?

Jonathan L Jesneck et al. BMC Cancer. .

Abstract

Background: Because screening mammography for breast cancer is less effective for premenopausal women, we investigated the feasibility of a diagnostic blood test using serum proteins.

Methods: This study used a set of 98 serum proteins and chose diagnostically relevant subsets via various feature-selection techniques. Because of significant noise in the data set, we applied iterated Bayesian model averaging to account for model selection uncertainty and to improve generalization performance. We assessed generalization performance using leave-one-out cross-validation (LOOCV) and receiver operating characteristic (ROC) curve analysis.

Results: The classifiers were able to distinguish normal tissue from breast cancer with a classification performance of AUC = 0.82 +/- 0.04 with the proteins MIF, MMP-9, and MPO. The classifiers distinguished normal tissue from benign lesions similarly at AUC = 0.80 +/- 0.05. However, the serum proteins of benign and malignant lesions were indistinguishable (AUC = 0.55 +/- 0.06). The classification tasks of normal vs. cancer and normal vs. benign selected the same top feature: MIF, which suggests that the biomarkers indicated inflammatory response rather than cancer.

Conclusion: Overall, the selected serum proteins showed moderate ability for detecting lesions. However, they are probably more indicative of secondary effects such as inflammation rather than specific for malignancy.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ROC curves showing the classification performance of statistical models using the serum protein levels. The models were run with a 70% train and 30% test split of the data set (A-C) and also with leave-one-out cross-validation (LOOCV) (D-F). The classifiers performed similarly, with moderate classification results for normal vs. malignant or benign lesions (A, B, D, E) and poor classification results for malignant vs. benign lesions (C, F).
Figure 2
Figure 2
Posterior predictions of Bayesian model averaging (BMA) of probit models, run with a 70% train and 30% test split of the data set (A-C) and also with leave-one-out cross-validation (LOOCV) (D-F). The classifiers achieved moderate classification results for normal vs. malignant or benign lesions (A, B, D, E) and poor classification results for malignant vs. benign lesions (C, F).
Figure 3
Figure 3
Models selected by BMA of linear models. Features are plotted in decreasing posterior probability of being nonzero. Models are ordered by selection frequency, with the best, most frequently selected models on the left and the weakest, rarest chosen on the right. Coefficients with positive values are shown in red and negative values in blue. Strong, frequently selected features appear as solid horizontal stripes. A beige value indicates that the protein was not selected in a particular model.
Figure 4
Figure 4
Posterior distributions of the model coefficients for the proteins. The distributions are mixtures of a point mass at zero and a normal distribution. The height of the solid line at zero represents the posterior probability that the coefficient is zero. The nonzero part of the distribution is scaled so that the maximum height is equal to the probability that the coefficient is nonzero.
Figure 5
Figure 5
Heatmap of normalized frequencies of selected features, normal vs. cancer. The feature selection frequencies were averaged over all folds of the LOOCV. For comparison across techniques, the frequencies in each column were scaled to sum to one. Less-frequently selected features appear as cooler dark blue colors, whereas more frequently selected features appear as hotter, brighter colors. Models that used fewer features appear as dark columns with a few bright bands, whereas models that used more features appear as denser smears of darker bands.
Figure 6
Figure 6
ROC and accuracy curves for linear models with four feature selection techniques. 1) Preselected: the features (using all the data to choose the best features, and then running the model using only those preselected features in LOOCV), 2) BMA: iterated Bayesian model averaging, 3) Stepwise feature selection, and 4) All features: using all the proteins in the model, no feature selection.

Similar articles

Cited by

References

    1. Ferrini R. Screening mammography for breast cancer: American College of Preventive Medicine practice policy statement. Am J Prev Med. 1996;12(5):340–1. - PubMed
    1. Meyer JE. Occult breast abnormalities: percutaneous preoperative needle localization. Radiology. 1984;150:335–337. - PubMed
    1. Rosenberg AL. Clinically occult breast lesions: localization and significance. Radiology. 1987;162:167–170. - PubMed
    1. Yankaskas BC. Needle localization biopsy of occult lesions of the breast. Radiology. 1988;23:729–733. - PubMed
    1. Kreunin P. Proteomic profiling identifies breast tumor metastasis-associated factors in an isogenic model. Proteomics. 2007;7(2):299–312. - PMC - PubMed

Publication types