Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 19:14:100.
doi: 10.1186/1471-2105-14-100.

Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees

Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees

Hsiu-Ling Chou et al. BMC Bioinformatics. .

Abstract

Background: Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann-Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression.

Results: The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence.

Conclusions: The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart showing the protocol used for the search and download of breast cancer microarray datasets from the GEO database.
Figure 2
Figure 2
Flow chart of the protocol used for study subject selection.
Figure 3
Figure 3
Diagram of the methods used to identify predictive genes and establish prediction models.
Figure 4
Figure 4
The AUC values of different gene numbers and the Cox regression of five-year recurrence rates of the test samples.
Figure 5
Figure 5
Kaplan-Meier analysis of 21 gene expression profile.
Figure 6
Figure 6
Breast cancer-related genes and DNA damage checkpoint regulation at the G2/M phase of the cell cycle. ANN: Artificial Neural Network; DA: Decision Tree combined with ANN; LR: Logistic Regression; DL: Decision Tree combined with LR; DT: Decision Tree.
Figure 7
Figure 7
Accuracy ratio between single and composite models.

Similar articles

Cited by

References

    1. American Cancer Society. 2011. http://www.cancer.org/docroot/home/index.asp.
    1. Eifel P, Axelson JA, Costa J, Crowley J, Curran WJ Jr, Deshler A, Fulton S, Hendricks CB, Kemeny M, Kornblith AB. National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer: 1–3 November 2000. J Natl Cancer Inst. 2001;93:979–989. - PubMed
    1. McGuire WL. Breast cancer prognostic factors: evaluation guidelines. J Natl Cancer Inst. 1991;83:154–155. doi: 10.1093/jnci/83.3.154. - DOI - PubMed
    1. Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Kuffner R, Zimmer R. Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics. 2006;22:2356–2363. doi: 10.1093/bioinformatics/btl400. - DOI - PubMed
    1. Gemignani F, Perra C, Landi S, Canzian F, Kurg A, Tonisson N, Galanello R, Cao A, Metspalu A, Romeo G. Reliable detection of beta-thalassemia and G6PD mutations by a DNA microarray. Clin Chem. 2002;48:2051–2054. - PubMed

Substances