Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb 24;106(8):2824-8.
doi: 10.1073/pnas.0809444106. Epub 2009 Feb 5.

Prognostic gene signatures for non-small-cell lung cancer

Affiliations

Prognostic gene signatures for non-small-cell lung cancer

Paul C Boutros et al. Proc Natl Acad Sci U S A. .

Abstract

Resectable non-small-cell lung cancer (NSCLC) patients have poor prognosis, with 30-50% relapsing within 5 years. Current staging criteria do not fully capture the complexity of this disease. Survival could be improved by identification of those early-stage patients who are most likely to benefit from adjuvant therapy. Molecular classification by using mRNA expression profiles has led to multiple, poorly overlapping signatures. We hypothesized that differing statistical methodologies contribute to this lack of overlap. To test this hypothesis, we analyzed our previously published quantitative RT-PCR dataset with a semisupervised method. A 6-gene signature was identified and validated in 4 independent public microarray datasets that represent a range of tumor histologies and stages. This result demonstrated that at least 2 prognostic signatures can be derived from this single dataset. We next estimated the total number of prognostic signatures in this dataset with a 10-million-signature permutation study. Our 6-gene signature was among the top 0.02% of signatures with maximum verifiability, reaffirming its efficacy. Importantly, this analysis identified 1,789 unique signatures, implying that our dataset contains >500,000 verifiable prognostic signatures for NSCLC. This result appears to rationalize the observed lack of overlap among reported NSCLC prognostic signatures.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Classifier development. The mSD algorithm was trained on an RT-PCR dataset of 158 genes in 147 NSCLC patients. The resulting 6-gene classifier separated patients into 2 groups with significantly different outcomes (A). Leave-one-out cross-validation again identified 2 groups with significantly different outcomes (B). The number of patients at risk at each time interval in the molecularly defined good- and poor-prognosis groups is listed below each survival curve. The stage-adjusted hazard ratio (HR), P value (Wald test), and number of patients classified (N) are given on each survival curve.
Fig. 2.
Fig. 2.
Classifier validation. To validate the 6-gene classifier, we classified patients from 4 independent datasets. (A) Mixed adenocarcinomas and squamous cell carcinomas profiled with Affymetrix HG-U133Plus2 arrays by Potti et al. (15). (B) Adenocarcinomas profiled on cDNA arrays by Larsen et al. (13). (C) Squamous cell carcinomas profiled on Affymetrix HG-U133A arrays by Raponi et al. (16). (D) Squamous cell carcinomas profiled on cDNA arrays by Larsen et al. (14). The number of patients at risk in each molecularly-defined group is indicated at several time points. The stage-adjusted hazard ratio (HR), P value (Wald test), and the number of patients successfully classified (N) are also shown.
Fig. 3.
Fig. 3.
Permutation validation. Ten million 6-gene signatures were generated at random from our training dataset. The ability of each signature to separate the training dataset into 2 groups with significantly different prognoses was evaluated using the log-rank test. The kernel density of the χ2 values from this log-rank test was generated (A). The x axis indicates the χ2 values: Larger values indicate a lower P value and hence a more statistically significant separation of patient groups in the training dataset. The y axis gives the kernel density, which reflects the probability distribution of the dataset. Higher values indicate a larger fraction of the population, akin to a smoothed histogram. The performance of the mSD signature is marked with an arrow. These 10 million trained signatures were then tested in 4 independent datasets. Kernel density estimates, as above, are provided for each test dataset (B–E). Each test dataset is labeled with the first author of the study. The performance of the mSD signature is marked with an arrow. Finally, to demonstrate the significance of the mSD signature across all 4 test datasets we generated a validation score by multiplying the percentile rankings of each signature in each of the 4 test datasets. Higher values thus correspond to improved validation across all 4 datasets. The performance of the mSD signature is marked with an arrow.
Fig. 4.
Fig. 4.
Prognostic genes. For each gene, we calculated the fraction of 6-gene signatures containing each gene that are statistically significant at P < 0.05 (A). A zoom-in on the 10 most enriched genes is also shown (B). The horizontal line represents the 5% level expected by chance alone, the y axis gives the fraction of signatures containing that gene that are significant at P < 0.05 and individual genes are on the x axis.

References

    1. Tsuboi M, et al. The present status of postoperative adjuvant chemotherapy for completely resected non-small cell lung cancer. Ann Thorac Cardiovasc Surg. 2007;13:73–77. - PubMed
    1. Mountain CF. Staging classification of lung cancer. A critical evaluation. Clin Chest Med. 2002;23:103–121. - PubMed
    1. Mountain CF. Revisions in the International System for Staging Lung Cancer. Chest. 1997;111:1710–1717. - PubMed
    1. Jones KL, Buzdar AU. A review of adjuvant hormonal therapy in breast cancer. Endocr Relat Cancer. 2004;11:391–406. - PubMed
    1. Zaniboni A, Labianca R. Adjuvant therapy for stage II colon cancer: An elephant in the living room? Ann Oncol. 2004;15:1310–1318. - PubMed

Publication types

MeSH terms