I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

doi:10.1186/s13059-019-1640-4

. 2019 Mar 7;20(1):52.

doi: 10.1186/s13059-019-1640-4.

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Kin Yau Wong¹, Cheng Fan², Maki Tanioka^{2

3}, Joel S Parker^{2

3}, Andrew B Nobel^{2

4

5}, Donglin Zeng^{2

5}, Dan-Yu Lin^{6

7}, Charles M Perou^{8

9}

Affiliations

¹ Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
² Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA.
³ Department of Genetics, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁴ Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁵ Department of Biostatistics, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁶ Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA. lin@bios.unc.edu.
⁷ Department of Biostatistics, University of North Carolina, Chapel Hill, 27599, NC, USA. lin@bios.unc.edu.
⁸ Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA. chuck_perou@med.unc.edu.
⁹ Department of Genetics, University of North Carolina, Chapel Hill, 27599, NC, USA. chuck_perou@med.unc.edu.

PMID: 30845957
PMCID: PMC6404283
DOI: 10.1186/s13059-019-1640-4

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Kin Yau Wong et al. Genome Biol. 2019.

. 2019 Mar 7;20(1):52.

doi: 10.1186/s13059-019-1640-4.

Authors

Kin Yau Wong¹, Cheng Fan², Maki Tanioka^{2

3}, Joel S Parker^{2

3}, Andrew B Nobel^{2

4

5}, Donglin Zeng^{2

5}, Dan-Yu Lin^{6

7}, Charles M Perou^{8

9}

Affiliations

¹ Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
² Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA.
³ Department of Genetics, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁴ Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁵ Department of Biostatistics, University of North Carolina, Chapel Hill, 27599, NC, USA.
⁶ Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA. lin@bios.unc.edu.
⁷ Department of Biostatistics, University of North Carolina, Chapel Hill, 27599, NC, USA. lin@bios.unc.edu.
⁸ Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, 27599, NC, USA. chuck_perou@med.unc.edu.
⁹ Department of Genetics, University of North Carolina, Chapel Hill, 27599, NC, USA. chuck_perou@med.unc.edu.

PMID: 30845957
PMCID: PMC6404283
DOI: 10.1186/s13059-019-1640-4

Abstract

We propose a statistical boosting method, termed I-Boost, to integrate multiple types of high-dimensional genomics data with clinical data for predicting survival time. I-Boost provides substantially higher prediction accuracy than existing methods. By applying I-Boost to The Cancer Genome Atlas, we show that the integration of multiple genomics platforms with clinical variables improves the prediction of survival time over the use of clinical variables alone; gene expression values are typically more prognostic of survival time than other genomics data types; and gene modules/signatures are at least as prognostic as the collection of individual gene expression data.

Keywords: Cancer genomics; Data integration; Gene modules; Variable selection.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

CMP is an equity stock holder, consultant, and Board of Director Member of BioClassifier LLC and GeneCentric Diagnostics. CMP is also listed as an inventor on patents on the Breast PAM50 and Lung Cancer Subtyping assays. The other authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Simulation settings and results. a Prediction accuracy of LASSO, elastic net, I-Boost-CV, and I-Boost-Permutation measured by risk correlation under three different settings. b The average number of variables selected by the four methods under three different settings. Different types of the selected variables are represented by different colors. c MSE of the four methods under three different settings. The error is decomposed into errors of parameters for different data types, as represented by different colors. d Number of signal variables and distribution of signals across different data types for the three simulation settings. The number of signal variables is zero if the proportion of signals of the data type is 0%. Abbreviations are as follows: GeneExp represents individual gene expression, Module represents gene module, Clinical represents clinical variable, CNV represents copy number variant, Mutation represents somatic mutation, miRNA represents microRNA expression, and Protein represents protein expression

**Fig. 2**
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets using LASSO and elastic net. Each row represents a particular combination of data types used as predictors, as indicated by the box on the left. Each dot is an average C-index value obtained by performing LASSO or elastic net on 30 training and testing data set pairs. See the caption of Fig. 1 for the abbreviations of the data types

**Fig. 3**
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets using elastic net, I-Boost-CV, and I-Boost-Permutation. Each row represents a particular combination of data types used as predictors, as indicated by the box on the left. Each dot is an average C-index value obtained by performing elastic net, I-Boost-CV, or I-Boost-Permutation on 30 training and testing data set pairs. See the caption of Fig. 1 for the abbreviations of the data types

**Fig. 4**
NRI values for the TCGA LUAD, KIRC, and pan-cancer data sets using I-Boost-CV or I-Boost-Permutation. Each dot represents the average NRI between a model with both clinical and genomic variables (estimated by I-Boost-CV or I-Boost-Permutation) and the model with clinical variables only (estimated by maximum partial-likelihood estimation) over 30 training and testing data set pairs

**Fig. 5**
NRI values between models containing individual gene expression data and models containing gene modules under the TCGA LUAD, KIRC, and pan-cancer data sets. Each dot represents the average NRI obtained by fitting I-Boost-CV or I-Boost-Permutation on two sets of predictors over 30 training and testing data set pairs. The first set of predictors contains a combination of data types and gene modules; the second set of predictors contains the same combination of data types and individual gene expression data. A positive NRI represents better prediction using the model with gene modules

**Fig. 6**
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets, using elastic net, I-Boost-CV, and I-Boost-Permutation on nested models. In the left panel, the leftmost dots are fixed at zero, and each remaining dot represents the average NRI obtained by fitting elastic net, I-Boost-CV, or I-Boost-Permutation over 30 training and testing data set pairs. Each dot except the leftmost dots represents the maximum NRI between a model that contains one more data type than the model corresponding to the dot on the left and the model corresponding to the dot on the left. Above each dot, the name of the additional data type is included. In the right panel, the average C-index values and the average numbers of selected variables for the models shown in the left panel are plotted. The arrows indicate the orders of models with respect to the number of data types they contain. See the caption of Fig. 1 for the abbreviations of the data types

See this image and copyright information in PMC

Cited by

A meta-learning approach for genomic survival analysis.
Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. Qiu YL, et al. Nat Commun. 2020 Dec 11;11(1):6350. doi: 10.1038/s41467-020-20167-3. Nat Commun. 2020. PMID: 33311484 Free PMC article.
CALGB 40603 (Alliance): Long-Term Outcomes and Genomic Correlates of Response and Survival After Neoadjuvant Chemotherapy With or Without Carboplatin and Bevacizumab in Triple-Negative Breast Cancer.
Shepherd JH, Ballman K, Polley MC, Campbell JD, Fan C, Selitsky S, Fernandez-Martinez A, Parker JS, Hoadley KA, Hu Z, Li Y, Soloway MG, Spears PA, Singh B, Tolaney SM, Somlo G, Port ER, Ma C, Kuzma C, Mamounas E, Golshan M, Bellon JR, Collyar D, Hahn OM, Hudis CA, Winer EP, Partridge A, Hyslop T, Carey LA, Perou CM, Sikov WM. Shepherd JH, et al. J Clin Oncol. 2022 Apr 20;40(12):1323-1334. doi: 10.1200/JCO.21.01506. Epub 2022 Jan 19. J Clin Oncol. 2022. PMID: 35044810 Free PMC article. Clinical Trial.
A comprehensive review of cancer survival prediction using multi-omics integration and clinical variables.
Tran D, Nguyen H, Pham VD, Nguyen P, Nguyen Luu H, Minh Phan L, Blair DeStefano C, Jim Yeung SC, Nguyen T. Tran D, et al. Brief Bioinform. 2025 Mar 4;26(2):bbaf150. doi: 10.1093/bib/bbaf150. Brief Bioinform. 2025. PMID: 40221959 Free PMC article. Review.
Elucidating the Influence of MPT-driven necrosis-linked LncRNAs on immunotherapy outcomes, sensitivity to chemotherapy, and mechanisms of cell death in clear cell renal carcinoma.
Huang J, Liu M, Chen H, Zhang J, Xie X, Jiang L, Zhang S, Jiang C, Zhang J, Zhang Q, Yang G, Chi H, Tian G. Huang J, et al. Front Oncol. 2023 Dec 15;13:1276715. doi: 10.3389/fonc.2023.1276715. eCollection 2023. Front Oncol. 2023. PMID: 38162499 Free PMC article.
PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.
Wong KY, Zeng D, Lin DY. Wong KY, et al. Stat Sin. 2023 Apr;33(2):633-662. doi: 10.5705/ss.202020.0401. Stat Sin. 2023. PMID: 37197479 Free PMC article.

See all "Cited by" articles

References

1. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822–7. doi: 10.1038/nm.1790. - DOI - PMC - PubMed
1. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA. 2001;98:11462–7. doi: 10.1073/pnas.201162998. - DOI - PMC - PubMed
1. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8:816–24. doi: 10.1038/nm733. - DOI - PubMed
1. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. doi: 10.1038/nm0102-68. - DOI - PubMed
1. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822–7. doi: 10.1038/nm.1790. - DOI - PMC - PubMed

[2] Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822–7. doi: 10.1038/nm.1790. - DOI - PMC - PubMed

[3] West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA. 2001;98:11462–7. doi: 10.1073/pnas.201162998. - DOI - PMC - PubMed

[4] West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA. 2001;98:11462–7. doi: 10.1073/pnas.201162998. - DOI - PMC - PubMed

[5] Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8:816–24. doi: 10.1038/nm733. - DOI - PubMed

[6] Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8:816–24. doi: 10.1038/nm733. - DOI - PubMed

[7] Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. doi: 10.1038/nm0102-68. - DOI - PubMed

[8] Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. doi: 10.1038/nm0102-68. - DOI - PubMed

[9] van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. - DOI - PubMed

[10] van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed