Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;264(2):387-96.
doi: 10.1148/radiol.12111607. Epub 2012 Jun 21.

Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data--methods and preliminary results

Affiliations

Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data--methods and preliminary results

Olivier Gevaert et al. Radiology. 2012 Aug.

Abstract

Purpose: To identify prognostic imaging biomarkers in non-small cell lung cancer (NSCLC) by means of a radiogenomics strategy that integrates gene expression and medical images in patients for whom survival outcomes are not available by leveraging survival data in public gene expression data sets.

Materials and methods: A radiogenomics strategy for associating image features with clusters of coexpressed genes (metagenes) was defined. First, a radiogenomics correlation map is created for a pairwise association between image features and metagenes. Next, predictive models of metagenes are built in terms of image features by using sparse linear regression. Similarly, predictive models of image features are built in terms of metagenes. Finally, the prognostic significance of the predicted image features are evaluated in a public gene expression data set with survival outcomes. This radiogenomics strategy was applied to a cohort of 26 patients with NSCLC for whom gene expression and 180 image features from computed tomography (CT) and positron emission tomography (PET)/CT were available.

Results: There were 243 statistically significant pairwise correlations between image features and metagenes of NSCLC. Metagenes were predicted in terms of image features with an accuracy of 59%-83%. One hundred fourteen of 180 CT image features and the PET standardized uptake value were predicted in terms of metagenes with an accuracy of 65%-86%. When the predicted image features were mapped to a public gene expression data set with survival outcomes, tumor size, edge shape, and sharpness ranked highest for prognostic significance.

Conclusion: This radiogenomics strategy for identifying imaging biomarkers may enable a more rapid evaluation of novel imaging modalities, thereby accelerating their translation to personalized medicine.

PubMed Disclaimer

Figures

Figure 1a:
Figure 1a:
Creation of radiogenomics map for non-small cell lung cancer (NSCLC). (a) Strategy for creation and use of the radiogenomics map in NSCLC. Step 1 integrates the computed tomographic (CT) and positron emission tomographic (PET)/CT image and the gene microarray data from our study cohort. Step 2 maps the metagenes to publicly available microarray data with survival. Step 3 links image features expressed in terms of metagenes to public gene expression data. The dashes in this link highlight its indirectness, because by leveraging public gene expression data, we are able to associate the image features in the study cohort with survival, even without survival data in the study cohort. (b) Hierarchical clustering of radiogenomics correlations map with metagenes (in rows) and image features (in columns). Black squares = 243 significant associations between an image feature and a metagene with q < 5%. (c) Association between metagene 12 and the image feature for the internal air bronchogram; (i) gene expression of genes in metagene 12; (ii) metagene 12 expression; (iii) presence of an internal air bronchogram, where * = squamous lung carcinoma cases; and (iv) sample CT images of a lesion with (top) an internal air bronchogram present versus (bottom) a lesion with an internal air bronchogram absent. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature.
Figure 1b:
Figure 1b:
Creation of radiogenomics map for non-small cell lung cancer (NSCLC). (a) Strategy for creation and use of the radiogenomics map in NSCLC. Step 1 integrates the computed tomographic (CT) and positron emission tomographic (PET)/CT image and the gene microarray data from our study cohort. Step 2 maps the metagenes to publicly available microarray data with survival. Step 3 links image features expressed in terms of metagenes to public gene expression data. The dashes in this link highlight its indirectness, because by leveraging public gene expression data, we are able to associate the image features in the study cohort with survival, even without survival data in the study cohort. (b) Hierarchical clustering of radiogenomics correlations map with metagenes (in rows) and image features (in columns). Black squares = 243 significant associations between an image feature and a metagene with q < 5%. (c) Association between metagene 12 and the image feature for the internal air bronchogram; (i) gene expression of genes in metagene 12; (ii) metagene 12 expression; (iii) presence of an internal air bronchogram, where * = squamous lung carcinoma cases; and (iv) sample CT images of a lesion with (top) an internal air bronchogram present versus (bottom) a lesion with an internal air bronchogram absent. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature.
Figure 1c:
Figure 1c:
Creation of radiogenomics map for non-small cell lung cancer (NSCLC). (a) Strategy for creation and use of the radiogenomics map in NSCLC. Step 1 integrates the computed tomographic (CT) and positron emission tomographic (PET)/CT image and the gene microarray data from our study cohort. Step 2 maps the metagenes to publicly available microarray data with survival. Step 3 links image features expressed in terms of metagenes to public gene expression data. The dashes in this link highlight its indirectness, because by leveraging public gene expression data, we are able to associate the image features in the study cohort with survival, even without survival data in the study cohort. (b) Hierarchical clustering of radiogenomics correlations map with metagenes (in rows) and image features (in columns). Black squares = 243 significant associations between an image feature and a metagene with q < 5%. (c) Association between metagene 12 and the image feature for the internal air bronchogram; (i) gene expression of genes in metagene 12; (ii) metagene 12 expression; (iii) presence of an internal air bronchogram, where * = squamous lung carcinoma cases; and (iv) sample CT images of a lesion with (top) an internal air bronchogram present versus (bottom) a lesion with an internal air bronchogram absent. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature.
Figure 2a:
Figure 2a:
Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity in the number of metagenes that are selected. I1 = the first image feature of k image features in total; Mj = jth metagene; fi = linear regression for the ith image feature; α = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram corresponding to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.
Figure 2b:
Figure 2b:
Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity in the number of metagenes that are selected. I1 = the first image feature of k image features in total; Mj = jth metagene; fi = linear regression for the ith image feature; α = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram corresponding to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.
Figure 2c:
Figure 2c:
Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity in the number of metagenes that are selected. I1 = the first image feature of k image features in total; Mj = jth metagene; fi = linear regression for the ith image feature; α = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram corresponding to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.
Figure 2d:
Figure 2d:
Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity in the number of metagenes that are selected. I1 = the first image feature of k image features in total; Mj = jth metagene; fi = linear regression for the ith image feature; α = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram corresponding to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.
Figure 2e:
Figure 2e:
Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity in the number of metagenes that are selected. I1 = the first image feature of k image features in total; Mj = jth metagene; fi = linear regression for the ith image feature; α = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram corresponding to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.
Figure 3a:
Figure 3a:
Validation of the predicted image features related to computed lesion size at CT. (a) Graph shows the correlation of the predicted image feature “lesion size” and the actual lesion size in the Lee et al (13) data (P < .0001). (b) Graph shows comparison of the accuracy of the predicted image features lesion size, minor axis, and major axis with their correlation with actual tumor size in the Lee et al data.
Figure 3b:
Figure 3b:
Validation of the predicted image features related to computed lesion size at CT. (a) Graph shows the correlation of the predicted image feature “lesion size” and the actual lesion size in the Lee et al (13) data (P < .0001). (b) Graph shows comparison of the accuracy of the predicted image features lesion size, minor axis, and major axis with their correlation with actual tumor size in the Lee et al data.
Figure 4a:
Figure 4a:
Univariate survival analysis for four predicted image features evaluated in the Lee et al (13) data for RFS. Predicted image features are expressed in terms of a multivariate gene expression signature. (a) Survival curves for predicted lesion size at CT, with top right image showing that a small CT lesion is associated with good prognosis (green curve) and bottom right image showing that a large CT lesion is associated with poor prognosis (red curve). (b) Survival curves for predicted presence versus absence of pleural attachment at CT. (c) Survival curves based on predicted edge sharpness composite 1, with top right image showing that high edge sharpness is associated with good prognosis (red curve) and bottom right image showing that low edge sharpness is associated with poor prognosis (green curve). (d) Survival curves based on predicted presence versus absence of internal air bronchogram.
Figure 4b:
Figure 4b:
Univariate survival analysis for four predicted image features evaluated in the Lee et al (13) data for RFS. Predicted image features are expressed in terms of a multivariate gene expression signature. (a) Survival curves for predicted lesion size at CT, with top right image showing that a small CT lesion is associated with good prognosis (green curve) and bottom right image showing that a large CT lesion is associated with poor prognosis (red curve). (b) Survival curves for predicted presence versus absence of pleural attachment at CT. (c) Survival curves based on predicted edge sharpness composite 1, with top right image showing that high edge sharpness is associated with good prognosis (red curve) and bottom right image showing that low edge sharpness is associated with poor prognosis (green curve). (d) Survival curves based on predicted presence versus absence of internal air bronchogram.
Figure 4c:
Figure 4c:
Univariate survival analysis for four predicted image features evaluated in the Lee et al (13) data for RFS. Predicted image features are expressed in terms of a multivariate gene expression signature. (a) Survival curves for predicted lesion size at CT, with top right image showing that a small CT lesion is associated with good prognosis (green curve) and bottom right image showing that a large CT lesion is associated with poor prognosis (red curve). (b) Survival curves for predicted presence versus absence of pleural attachment at CT. (c) Survival curves based on predicted edge sharpness composite 1, with top right image showing that high edge sharpness is associated with good prognosis (red curve) and bottom right image showing that low edge sharpness is associated with poor prognosis (green curve). (d) Survival curves based on predicted presence versus absence of internal air bronchogram.
Figure 4d:
Figure 4d:
Univariate survival analysis for four predicted image features evaluated in the Lee et al (13) data for RFS. Predicted image features are expressed in terms of a multivariate gene expression signature. (a) Survival curves for predicted lesion size at CT, with top right image showing that a small CT lesion is associated with good prognosis (green curve) and bottom right image showing that a large CT lesion is associated with poor prognosis (red curve). (b) Survival curves for predicted presence versus absence of pleural attachment at CT. (c) Survival curves based on predicted edge sharpness composite 1, with top right image showing that high edge sharpness is associated with good prognosis (red curve) and bottom right image showing that low edge sharpness is associated with poor prognosis (green curve). (d) Survival curves based on predicted presence versus absence of internal air bronchogram.
Figure 5:
Figure 5:
Multivariate survival analysis for predicted image features evaluated in the Lee et al data set (13) for RFS. Graphs show Kaplan-Meier survival curves for selected multivariate models on the Lee et al external data set. Left: Multivariate model using all predicted image features. Middle: Multivariate model using all image features except the three size features: lesion size, minor axis, and major axis. Right: Multivariate model using only semantic features.

Comment in

References

    1. Sotiriou C. Molecular biology in oncology and its influence on clinical practice: gene expression profiling [abstr]. Ann Oncol 2009;20(Suppl 4):v10
    1. Pao W, Kris MG, Iafrate AJ, et al. Integration of molecular profiling into the lung cancer clinic. Clin Cancer Res 2009;15(17):5317–5322 - PubMed
    1. Gevaert O, De Moor B. Prediction of cancer outcome using DNA microarray technology: past, present and future. Expert Opin Med Diagn 2009;3(2):157–165 - PubMed
    1. Segal E, Sirlin CB, Ooi C, et al. Decoding global gene expression programs in liver cancer by noninvasive imaging. Nat Biotechnol 2007;25(6):675–680 - PubMed
    1. Diehn M, Nardini C, Wang DS, et al. Identification of noninvasive imaging surrogates for brain tumor gene-expression modules. Proc Natl Acad Sci U S A 2008;105(13):5213–5218 - PMC - PubMed

MeSH terms