Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 10:11:594.
doi: 10.1186/1471-2105-11-594.

Addressing the challenge of defining valid proteomic biomarkers and classifiers

Affiliations

Addressing the challenge of defining valid proteomic biomarkers and classifiers

Mohammed Dakna et al. BMC Bioinformatics. .

Abstract

Background: The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School.

Results: We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential.

Conclusions: Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Study design. Usage of samples and flow of information. 67 samples from males and females were each employed in a training set for the definition of biomarkers, and establishment of classifiers. Subpopulations of 7, 20, and 33 samples were employed, where indicated. The results (potential biomarkers, classifiers) were evaluated on an independent set of blinded data that also consisted of 67 male and 67 female samples, to enable the best possible assessment.
Figure 2
Figure 2
Typical male and female peptide profiles. Distribution of a peptide included in the male-female comparative study. The frequency of the peptide ID:4356 is plotted against the natural logarithm of the measured intensity. Both profiles show a point mass at zero and a continuous component. The zero component arises because the peptide is either absent or its concentration is below the detection limit.
Figure 3
Figure 3
Number of significant markers depends on sample size. From the 2 × 67 training data, data sets of sizes Ndiff ranging from 7 to 67 were built via resampling. At each sample size the number of significant biomarkers (defined as having a p-value after BH adjustments < 0.05) is shown on the vertical axis. The procedure was repeated 10 times to generate the Box-Whisker plots. In all 10 experiments, no biomarkers could be declared significantly differentially expressed below a sample size of 13. On the top left, populations of sizes up to 480 were generated using resampling with replacement, based on the 2 × 67 samples. The figure shows that with sample sizes around 2 × 400 a plateau is reached.
Figure 4
Figure 4
The resulting power for two markers showing significance after BH adjustments. The power is calculated as the percentage of times the null hypothesis is rejected. To reach 90% power, Ndiff = 30 samples per group is required for ID:138036 (top), whereas for ID:19655 (bottom) Ndiff = 15 may be enough. From the original 134 samples, 30 cohorts of 2 × 7 subjects each were randomly built. From each cohort, 2000 resamples of increasing sample size (10-120) were generated via bootstrap with replacement. Circles indicate outliers.
Figure 5
Figure 5
Learning curve estimation of Ndisc. Cohorts with sample sizes ranging from 2 × 7 - 2 × 65 (given on the x-axis) were arbitrarily generated out of the entire 2 × 67 dataset. 20 repetitions were performed for each size cohort. An SVM-based classifier was built for each dataset and its performance was tested on the independent test set. In the left panel the area above the curve AAC (AAC = 1-AUC) for each classifier is shown. The misclassification error rate MER is shown on the right. The red curves represent the mean AAC and mean MER. The inverse power law behaviour is obvious.
Figure 6
Figure 6
Classification results of an SVM-based classifier. Male and female datasets of size 7, 20, 33, and 67 each were compared. Features (selected based on a p-value < 0.05 in the unadjusted WT) were combined into respective biomarker models (M7, M20, M33, and M67). Their performance was initially assessed by complete leave-one-out cross validation leading to an accuracy of 100%, 95%, 84% and 94%, respectively, erroneously indicating optimal performance of the M7 model. The ROC analysis shows the results when these models are tested on an independent set of 134 samples. As is evident and expected, best performance can be observed when employing the M67 model, while the M7 model barely exceeds the results obtained by mere guessing (The area under the curve AUC for the models M7, M20, M33 and M67 is 0.715, 0.786, 0.900 and 0.937, respectively).
Figure 7
Figure 7
Effect of sample size on the determination of the distribution of standardized mean-differences. The distribution of a single peptide, ID:19655, was investigated for four different study designs (N = 7, 20, 33, 67) in 1000 re-sampled distributions out of the complete set of 134 cases and controls each. Typical effect size δ = (μmale - μfemale)/σ (with μmale and μfemalebeing the mean logarithmic intensity for a given peptide in the male and female population, and the pooled standard deviation) is shown. As is evident, studies of small sample sizes (e.g. N = 7) may determine the log change to be e.g. -5 or even +1, instead of the true change, which is about -2. The consequence of such an error, is that the data obtained from this small training set completely miss the correct effect size and cannot be generalized with even decent confidence.

References

    1. Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 2006;24(8):971–83. doi: 10.1038/nbt1235. [Rifai1, Nader Gillette, Michael A Carr, Steven A Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Review United States Nature biotechnology Nat Biotechnol. 2006 Aug;24(8):971-83.] - DOI - PubMed
    1. Listgarten J, Emili A. Practical proteomic biomarker discovery: taking a step back to leap forward. Drug Discov Today. 2005;10(23-24):1697–702. doi: 10.1016/S1359-6446(05)03645-7. - DOI - PubMed
    1. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA. Use of proteomic patterns in serum to identify ovarian cancer. Lancet. 2002;359(9306):572–7. doi: 10.1016/S0140-6736(02)07746-2. - DOI - PubMed
    1. McLerran D, Grizzle WE, Feng Z, Thompson IM, Bigbee WL, Cazares LH, Chan DW, Dahlgren J, Diaz J, Kagan J, Lin DW, Malik G, Oelschlager D, Partin A, Randolph TW, Sokoll L, Srivastava S, Thornquist M, Troyer D, Wright GL, Zhang Z, Zhu L, Semmes OJ. SELDI-TOF MS whole serum proteomic profiling with IMAC surface does not reliably detect prostate cancer. Clin Chem. 2008;54:53–60. doi: 10.1373/clinchem.2007.091496. - DOI - PMC - PubMed
    1. Diamandis EP. Point: Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics? Clin Chem. 2003;49(8):1272–5. doi: 10.1373/49.8.1272. - DOI - PubMed

Publication types