Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 1:13:69.
doi: 10.1186/1471-2105-13-69.

Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions

Affiliations

Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions

Yupeng Cun et al. BMC Bioinformatics. .

Abstract

Background: Stratification of patients according to their clinical prognosis is a desirable goal in cancer treatment in order to achieve a better personalized medicine. Reliable predictions on the basis of gene signatures could support medical doctors on selecting the right therapeutic strategy. However, during the last years the low reproducibility of many published gene signatures has been criticized. It has been suggested that incorporation of network or pathway information into prognostic biomarker discovery could improve prediction performance. In the meanwhile a large number of different approaches have been suggested for the same purpose.

Methods: We found that on average incorporation of pathway information or protein interaction data did not significantly enhance prediction performance, but indeed greatly interpretability of gene signatures. Some methods (specifically network-based SVMs) could greatly enhance gene selection stability, but revealed only a comparably low prediction accuracy, whereas Reweighted Recursive Feature Elimination (RRFE) and average pathway expression led to very clearly interpretable signatures. In addition, average pathway expression, together with elastic net SVMs, showed the highest prediction performance here.

Results: The results indicated that no single algorithm to perform best with respect to all three categories in our study. Incorporating network of prior knowledge into gene selection methods in general did not significantly improve classification accuracy, but greatly interpretability of gene signatures compared to classical algorithms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prediction performance in terms of area under ROC curve (AUC) PAM (prediction analysis of microarray data), sigGenNB (SAM + Naïve Bayes), sigGenSVM (SAM + SVM),SCADSVM, HHSVM (Huberized Hinge loss SVM), RFE (Recursive Feature Elimination), RRFE (Reweighted Recursive Feature Elimination), graphK (graph diffusion kernels for SVMs), graphKp (p-step random walk graph kernel for SVMs), networkSVM (Network-based SVM), PAC (Pathway Activity Classification), aveExpPath (average pathway expression), HubClassify (classification by significant hub genes), pathBoost.
Figure 2
Figure 2
Signature stability. The y-axis shows the fraction of genes, being selected between 91 and 100 times.
Figure 3
Figure 3
Interpretability of signatures (enriched disease genes). For aveExpPath and PAC the enrichment of the particular disease category within selected pathway genes is shown. A represents data GSE2034 [34]; B represents data GSE11121 [39]; C represents data GSE1456 [35]; D represents data GSE2990 [36]; E represents data GSE4922 [37]; F represents data GSE7390 [38].
Figure 4
Figure 4
Interpretability of signatures (enriched KEGG-pathways). For aveExpPath the adjusted p-value for differential expression from the SAM-test is shown. For all other methods we tested pathway enrichment within the set of selected genes.
Figure 5
Figure 5
Interpretability of signatures (enriched drug targets). For aveExpPath and PAC the enrichment of drug targets within selected pathway genes is shown.

References

    1. Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B Met. 1996;58:267–288. http://www.jstor.org/stable/2346178.
    1. Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A. 2002;99(10):6567–6572. http://dx.doi.org/10.1073/pnas.082099299. - DOI - PMC - PubMed
    1. Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002;46:389–422. http://dx.doi.org/10.1023/A:1012487302797. - DOI
    1. Breiman L. Random Forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. http://dx.doi.org/10.1023/A:1010933404324. - DOI - DOI
    1. Vapnik V. The nature of statistical learning theory. 2. Springer; 2000.

Publication types