Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 9;12(3):e1004790.
doi: 10.1371/journal.pcbi.1004790. eCollection 2016 Mar.

Pathway-Based Genomics Prediction using Generalized Elastic Net

Affiliations

Pathway-Based Genomics Prediction using Generalized Elastic Net

Artem Sokolov et al. PLoS Comput Biol. .

Abstract

We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of four synthetic data scenarios.
In GGM+ and GGM-, the same network is used to simulate the gene expression matrix X and the signaling pathway represented by the weight vector w. That network is then provided to the GELnet in the GGM+ scenario, while a permuted version is used in the GGM- scenario. Scenarios Rand+ and Rand- are constructed in a similar fashion, but using a random covariance matrix instead.
Fig 2
Fig 2. Performance of elastic nets and GELnets on synthetic data generated via a GGM.
Plotted are 30 trials of the same experiment. The x- and y-axes in every plot correspond to Elastic Nets and GELnets, respectively. The top three plots show the scenario where the GELnets were provided the true feature-feature relationships, while the bottom three plots correspond to the scrambled network case. Lower values are better for all three performance metrics, and the points are colored in red whenever the performance metrics are lower in the GELnet models, and blue otherwise.
Fig 3
Fig 3. Distribution of % improvement in GELnets over elastic nets for RMSE and dispersion performance metrics.
Red curve corresponds to the case where GELnets were provided with the true network used to generate the data. Blue curve depicts the case where the permuted network was provided instead.
Fig 4
Fig 4. (Top) Results for predicting drug sensitivity in Gray cell lines.
Presented are all the drugs where GELnets outperformed Elastic Nets. The bar height displays % RMSE improvement and % dispersion improvement over Elastic Net solutions. Diamond shapes indicate instances where drugs sensitivity was significantly correlated (ANOVA, p-val < 0.05) with breast cancer subtype. (Bottom Left) Distribution of % RMSE improvement over Elastic Nets across all 74 drugs and the location of BIBW2992 and 5-FdUR in that distribution. (Bottom Right) Distribution of % dispersion improvement over Elastic Nets across all 74 drugs and the location of BIBW2992 and 5-FdUR in that distribution.
Fig 5
Fig 5. Models for sensitivity of Gray cell lines to BIBW2992 learned by GELnets.
We present the model weights for the top 30 genes (see text) as they change across the different values of the λ1 and λ2 meta-parameters. Higher positive (red) weights are associated with resistance, while higher negative (blue) weights are associated with sensitivity. The models are sorted by their % RMSE improvement over the corresponding Elastic Net models. The barplot on the left displays the average weight of each gene across all meta-parameter values where the weight is not zero.
Fig 6
Fig 6. The two mechanisms of resistance to BIBW2992 placed in the context of a gene regulatory network.
The network was constructed via GeneMANIA using the top 15 and bottom 15 genes of the model associated with the largest increase in GELnet performance over Elastic Nets. Genes captured by the model which is focused on the cell surface receptors are highlighted with a thin yellow border. Similarly, genes captured by the KLF7-centric model are highlighted with a thick yellow border. The size and the color intensity of a node designate the corresponding gene weight in the model. Red and blue nodes correspond to positive and negative weights, respectively. Gray nodes are “linked genes” included by GeneMANIA.
Fig 7
Fig 7. Comparison of GELnet models constructed using the Laplacian and the Diffusion kernel in an unsupervised setting.
The left panel shows the first two principal components of the PanCan12 dataset with samples colored by tissue-of-origin. The right two panels present the performance of GELnet models constructed using the Laplacian (red) and the Diffusion kernel (blue) of Pathway Commons. The x-axis captures how well each model reconstructs the unregularized principal component, measured via RMSE. The y-axis captures the dispersion. Individual points correspond to the 15 settings of the λ1 and λ2 sampled over a grid of values.

References

    1. Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, et al. Prognostic gene signatures for non-small-cell lung cancer. Proceedings of the National Academy of Sciences. 2009;106(8):2824–2828. 10.1073/pnas.0809444106 - DOI - PMC - PubMed
    1. Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS computational biology. 2011;7(10):e1002240 10.1371/journal.pcbi.1002240 - DOI - PMC - PubMed
    1. Vapnik V. The nature of statistical learning theory. springer; 1999. - PubMed
    1. Hastie T, Tibshirani R, Friedman J, Hastie T, Friedman J, Tibshirani R. The elements of statistical learning. vol. 2 Springer; 2009.
    1. Airola A, Pahikkala T, Waegeman W, De Baets B, Salakoski T. A comparison of AUC estimators in small-sample studies. In: Proceedings of the 3rd International workshop on Machine Learning in Systems Biology; 2009. p. 15–23.

Publication types