Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 31;14(1):e0204186.
doi: 10.1371/journal.pone.0204186. eCollection 2019.

EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma

Affiliations

EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma

Borong Shao et al. PLoS One. .

Abstract

Various feature selection algorithms have been proposed to identify cancer prognostic biomarkers. In recent years, however, their reproducibility is criticized. The performance of feature selection algorithms is shown to be affected by the datasets, underlying networks and evaluation metrics. One of the causes is the curse of dimensionality, which makes it hard to select the features that generalize well on independent data. Even the integration of biological networks does not mitigate this issue because the networks are large and many of their components are not relevant for the phenotype of interest. With the availability of multi-omics data, integrative approaches are being developed to build more robust predictive models. In this scenario, the higher data dimensions create greater challenges. We proposed a phenotype relevant network-based feature selection (PRNFS) framework and demonstrated its advantages in lung cancer prognosis prediction. We constructed cancer prognosis relevant networks based on epithelial mesenchymal transition (EMT) and integrated them with different types of omics data for feature selection. With less than 2.5% of the total dimensionality, we obtained EMT prognostic signatures that achieved remarkable prediction performance (average AUC values >0.8), very significant sample stratifications, and meaningful biological interpretations. In addition to finding EMT signatures from different omics data levels, we combined these single-omics signatures into multi-omics signatures, which improved sample stratifications significantly. Both single- and multi-omics EMT signatures were tested on independent multi-omics lung cancer datasets and significant sample stratifications were obtained.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The AUC, AUPR, and accuracies of EMT features versus random features using DM data with the core EMT network.
Gaussian kernel is used to estimate the density functions based on results from 30 times 10-fold cross-validation. For each cross-validation fold, EMT features and random features are tested on the same training and cross-validation samples. Each row in the figure corresponds to one feature selection algorithm. The last row corresponds to using all EMT features. The p-values of paired t-tests are provided in each sub-figure.
Fig 2
Fig 2. The comparison of prediction performance between FSFs and individually selected features for different feature selection algorithms.
The boxplot is based on the results from 30 times stratified 10-fold cross-validation.
Fig 3
Fig 3. EMT single-omics signatures can stratify test samples into significantly different prognostic groups.
The signature is selected by addDA2 algorithm using DM data.
Fig 4
Fig 4. EMT multi-omics signatures can stratify test samples into significantly different prognostic groups, when the corresponding single-omics signatures cannot.
The signature consists of both GE and DM single-omics signatures selected by t-test.

Similar articles

Cited by

References

    1. Howlader N, Mariotto AB, Woloshin S, Schwartz LM. Providing clinicians and patients with actual prognosis: cancer in the context of competing causes of death. Journal of the National Cancer Institute Monographs. 2014;2014(49):255–264. 10.1093/jncimonographs/lgu022 - DOI - PMC - PubMed
    1. Subramanian J, Simon R. Gene expression–based prognostic signatures in lung cancer: ready for clinical use? Journal of the National Cancer Institute. 2010;102(7):464–474. 10.1093/jnci/djq025 - DOI - PMC - PubMed
    1. Lapointe J, Li C, Higgins JP, Van De Rijn M, Bair E, Montgomery K, et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(3):811–816. 10.1073/pnas.0304146101 - DOI - PMC - PubMed
    1. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences. 2001;98(19):10869–10874. 10.1073/pnas.191367098 - DOI - PMC - PubMed
    1. Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, et al. The cancer genome atlas pan-cancer analysis project. Nature genetics. 2013;45(10):1113–1120. 10.1038/ng.2764 - DOI - PMC - PubMed

Publication types

Substances