Stratification bias in low signal microarray studies
- PMID: 17764577
- PMCID: PMC2211509
- DOI: 10.1186/1471-2105-8-326
Stratification bias in low signal microarray studies
Abstract
Background: When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated.
Results: We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice.
Conclusion: Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets.
Figures












Similar articles
-
Regularized binormal ROC method in disease classification using microarray data.BMC Bioinformatics. 2006 May 9;7:253. doi: 10.1186/1471-2105-7-253. BMC Bioinformatics. 2006. PMID: 16684357 Free PMC article.
-
Improved variance estimation of classification performance via reduction of bias caused by small sample size.BMC Bioinformatics. 2006 Mar 13;7:127. doi: 10.1186/1471-2105-7-127. BMC Bioinformatics. 2006. PMID: 16533392 Free PMC article.
-
Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91. BMC Bioinformatics. 2006. PMID: 16504092 Free PMC article.
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
Establishment of Best Practices for Evidence for Prediction: A Review.JAMA Psychiatry. 2020 May 1;77(5):534-540. doi: 10.1001/jamapsychiatry.2019.3671. JAMA Psychiatry. 2020. PMID: 31774490 Free PMC article. Review.
Cited by
-
A Probability-Based Models Ranking Approach: An Alternative Method of Machine-Learning Model Performance Assessment.Sensors (Basel). 2022 Aug 24;22(17):6361. doi: 10.3390/s22176361. Sensors (Basel). 2022. PMID: 36080820 Free PMC article.
-
From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.BMC Bioinformatics. 2010 Jan 30;11:69. doi: 10.1186/1471-2105-11-69. BMC Bioinformatics. 2010. PMID: 20113515 Free PMC article.
-
Unraveling biophysical interactions of radiation pneumonitis in non-small-cell lung cancer via Bayesian network analysis.Radiother Oncol. 2017 Apr;123(1):85-92. doi: 10.1016/j.radonc.2017.02.004. Epub 2017 Feb 22. Radiother Oncol. 2017. PMID: 28237401 Free PMC article.
-
An evaluation protocol for subtype-specific breast cancer event prediction.PLoS One. 2011;6(7):e21681. doi: 10.1371/journal.pone.0021681. Epub 2011 Jul 8. PLoS One. 2011. PMID: 21760900 Free PMC article.
-
Clinical Evaluation of a Microwave-Based Device for Detection of Traumatic Intracranial Hemorrhage.J Neurotrauma. 2017 Jul 1;34(13):2176-2182. doi: 10.1089/neu.2016.4869. Epub 2017 Mar 13. J Neurotrauma. 2017. PMID: 28287909 Free PMC article.
References
-
- Simon R, Radmacher M, Dobbin K, McShane L. Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification. Journal of the National Cancer Institute. 2003;95:14–18. - PubMed
-
- Dudoit S, Fridlyand J, Speed T. Comparison of discrimination methods for the classification of tumours using gene expression data. Journal of the American Statistical Association. 2002;97:77–87. doi: 10.1198/016214502753479248. - DOI
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Research Materials