. 2009 Nov 26:10:389.

doi: 10.1186/1471-2105-10-389.

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Herman M J Sontrop¹, Perry D Moerland, René van den Ham, Marcel J T Reinders, Wim F J Verhaegh

Affiliations

PMID: 19941644
PMCID: PMC2789744
DOI: 10.1186/1471-2105-10-389

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Herman M J Sontrop et al. BMC Bioinformatics. 2009.

. 2009 Nov 26:10:389.

doi: 10.1186/1471-2105-10-389.

Authors

Herman M J Sontrop¹, Perry D Moerland, René van den Ham, Marcel J T Reinders, Wim F J Verhaegh

Affiliation

¹ Molecular Diagnostics Department, Eindhoven, the Netherlands. Herman.Sontrop@philips.com

PMID: 19941644
PMCID: PMC2789744
DOI: 10.1186/1471-2105-10-389

Abstract

Background: Large discrepancies in signature composition and outcome concordance have been observed between different microarray breast cancer expression profiling studies. This is often ascribed to differences in array platform as well as biological variability. We conjecture that other reasons for the observed discrepancies are the measurement error associated with each feature and the choice of preprocessing method. Microarray data are known to be subject to technical variation and the confidence intervals around individual point estimates of expression levels can be wide. Furthermore, the estimated expression values also vary depending on the selected preprocessing scheme. In microarray breast cancer classification studies, however, these two forms of feature variability are almost always ignored and hence their exact role is unclear.

Results: We have performed a comprehensive sensitivity analysis of microarray breast cancer classification under the two types of feature variability mentioned above. We used data from six state of the art preprocessing methods, using a compendium consisting of eight different datasets, involving 1131 hybridizations, containing data from both one and two-color array technology. For a wide range of classifiers, we performed a joint study on performance, concordance and stability. In the stability analysis we explicitly tested classifiers for their noise tolerance by using perturbed expression profiles that are based on uncertainty information directly related to the preprocessing methods. Our results indicate that signature composition is strongly influenced by feature variability, even if the array platform and the stratification of patient samples are identical. In addition, we show that there is often a high level of discordance between individual class assignments for signatures constructed on data coming from different preprocessing schemes, even if the actual signature composition is identical.

Conclusion: Feature variability can have a strong impact on breast cancer signature composition, as well as the classification of individual patient samples. We therefore strongly recommend that feature variability is considered in analyzing data from microarray breast cancer expression profiling experiments.

PubMed Disclaimer

Figures

**Figure 1**
**Sensitivity analysis protocol**. For an explanation, see the running text.

**Figure 2**
**Impact of perturbation variability on feature selection criterion of 70-gene signature**. Distributions are shown of the feature ranking criterion (Pearson correlation) calculated over 1000 perturbations of the 78 training samples of the Van 't Veer dataset. The dashed purple lines indicate the used absolute threshold of 0.3. Blue boxes indicate genes that do not meet this filter criterion in more than 50% of the perturbations. The red dots indicate the correlations obtained using the unperturbed expression values.

**Figure 3**
**Impact of perturbation variability on feature selection for the Affymetrix datasets**. Each dataset was split 50 times into a training and validation set, for which the validation set was subsequently discarded. Ranking was done only on the training sets. In addition, for each training set 50 perturbed versions were created and for each perturbation the overlap between F_{n, m, k}and and between and F_{2n, m, k}was determined, yielding 50·50·2 = 5000 overlap estimates for each list size n. The blue curves provide for each n ∈ {1,...,100} the mean overlap taken over all corresponding estimates. The red curves indicate the associated average relative strengths between the feature sets F_{n, m, k}and .

formula image — **Figure 3**
**Impact of perturbation variability on feature selection for the Affymetrix datasets**. Each dataset was split 50 times into a training and validation set, for which the validation set was subsequently discarded. Ranking was done only on the training sets. In addition, for each training set 50 perturbed versions were created and for each perturbation the overlap between F_{n, m, k}and and between and F_{2n, m, k}was determined, yielding 50·50·2 = 5000 overlap estimates for each list size n. The blue curves provide for each n ∈ {1,...,100} the mean overlap taken over all corresponding estimates. The red curves indicate the associated average relative strengths between the feature sets F_{n, m, k}and .

**Figure 4**
**Impact of preprocessing variability on feature selection for the Affymetrix datasets**. Comparison of top-100 ranked features lists F_{100, m, k}and F_{100, m', k}, as obtained using different preprocessing strategies m and m', for different splits k. A) Percentage of the top-half of one list that is found in the other list, and vice vera. Each boxplot represents the distribution of such percentages over 50 splits, for a specific pair (*m, m'*) (indicated on top of the figure). For each split, we determine the percentage of F_{50, m, k}found in F_{100, m', k}and the percentage of F_{50, m', k}found in F_{100, m, k}. Each distribution thus contains 50·2 = 100 points. All boxplots corresponding to the same preprocessing pair are colored similarly. In total there are 15 distinct pairs. The pairs are ordered by the observed median overlap over all six datasets. B) Distributions of the relative strength scores for top-ranked feature lists corresponding to the various preprocessing pairs. C) Relative strength of the top-100 multi-ranked gene lists with respect to the original rankings, for each preprocessing method and each Affymetrix dataset.

**Figure 5**
**Impact of perturbation variability on discriminant score**. Distributions are shown of the discriminant score x^Tw for each of the 106 validation samples of the Rosetta dataset, when using a nearest centroid classifier built on the 70-gene profile of [2], over 1000 perturbations. Perturbed expression data is based on the Rosetta error model. Red dots indicate the discriminant scores corresponding to the unperturbed expression data. The blue boxes indicate samples with a map-score of at least 25%.

**Figure 6**
**A map-matrix example for the Rosetta dataset**. The minimum assignment percentages (white = 0%, black = 50%) for the 106 validation samples and signatures of increasing size, determined over 1000 perturbations of the validation data. The column indicated by the dashed lines corresponds to the original 70-gene signature.

**Figure 7**
**Performance and stability curves for the Rosetta dataset**. P and S-curves for the Rosetta data for various classifiers. The x-axis shows the signature size, the y-axis in the upper panel gives the average balanced accuracy over 50 splits and the y-axis in the lower panel gives the average percentage of cases over 50 splits with a map-score larger than 35. Each column shows the results for a different classifier.

**Figure 8**
**Performance curves for the Affymetrix datasets**. Rows represent curves obtained using different classifiers, while columns represent curves for different datasets. Within each cell, performance curves associated with different preprocessing methods are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average balanced accuracy over 50 splits. For each dataset and split, the top-100 feature set was computed using the multi-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

**Figure 9**
**Discordance curves for the Affymetrix datasets**. Rows represent different preprocessing pairs, while columns represent curves for different datasets. Within each cell, discordance curves corresponding to different classifiers are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases, over 50 splits, of inconsistent class assignments on the unperturbed validation sets. For each dataset and split, the top-100 feature set was computed using the multi-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

**Figure 10**
**Stability curves for the Affymetrix datasets**. Rows represent curves obtained using different classifiers, while columns represent curves for different datasets. Within each cell, stability curves associated with different preprocessing methods are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases over 50 splits with a map-score larger than 35. For each dataset and split, the top-100 feature set was computed using the multi-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

**Figure 11**
**Trade-off dilemma of performance versus stability**. Different scenarios are shown for the performance of a classifier versus its stability. Scenario 1: Stable yet poor performance, always achievable by a decision rule that assigns all samples to the same class; Scenario 2: Preferred scenario; Scenario 3: Random classifier; Scenario 4: Unrealistic perturbations, likely to happen when using jitter.

See this image and copyright information in PMC

References

1. Amaratunga D, Cabrera J. Exploration and analysis of DNA microarray and protein array data. John Wiley Hoboken, NJ; 2004.
1. van't Veer L, Dai H, Vijver M van de, He Y, Hart A, Mao M, Peterse H, Kooy K van der, Marton M, Witteveen A. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–536. doi: 10.1038/415530a. - DOI - PubMed
1. Wessels L, Reinders M, Hart A, Veenman C, Dai H, He Y, Veer L. A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics. 2005;21(19):3755–3762. doi: 10.1093/bioinformatics/bti429. - DOI - PubMed
1. van Vliet M, Reyal F, Horlings H, Vijver M van de, Reinders M, Wessels L. Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics. 2008;9:375. doi: 10.1186/1471-2164-9-375. - DOI - PMC - PubMed
1. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet. 2005;365(9458):488–492. doi: 10.1016/S0140-6736(05)17866-0. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Affiliation

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Authors

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical