Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 7;20(1):52.
doi: 10.1186/s13059-019-1640-4.

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Affiliations

I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Kin Yau Wong et al. Genome Biol. .

Abstract

We propose a statistical boosting method, termed I-Boost, to integrate multiple types of high-dimensional genomics data with clinical data for predicting survival time. I-Boost provides substantially higher prediction accuracy than existing methods. By applying I-Boost to The Cancer Genome Atlas, we show that the integration of multiple genomics platforms with clinical variables improves the prediction of survival time over the use of clinical variables alone; gene expression values are typically more prognostic of survival time than other genomics data types; and gene modules/signatures are at least as prognostic as the collection of individual gene expression data.

Keywords: Cancer genomics; Data integration; Gene modules; Variable selection.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

CMP is an equity stock holder, consultant, and Board of Director Member of BioClassifier LLC and GeneCentric Diagnostics. CMP is also listed as an inventor on patents on the Breast PAM50 and Lung Cancer Subtyping assays. The other authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Simulation settings and results. a Prediction accuracy of LASSO, elastic net, I-Boost-CV, and I-Boost-Permutation measured by risk correlation under three different settings. b The average number of variables selected by the four methods under three different settings. Different types of the selected variables are represented by different colors. c MSE of the four methods under three different settings. The error is decomposed into errors of parameters for different data types, as represented by different colors. d Number of signal variables and distribution of signals across different data types for the three simulation settings. The number of signal variables is zero if the proportion of signals of the data type is 0%. Abbreviations are as follows: GeneExp represents individual gene expression, Module represents gene module, Clinical represents clinical variable, CNV represents copy number variant, Mutation represents somatic mutation, miRNA represents microRNA expression, and Protein represents protein expression
Fig. 2
Fig. 2
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets using LASSO and elastic net. Each row represents a particular combination of data types used as predictors, as indicated by the box on the left. Each dot is an average C-index value obtained by performing LASSO or elastic net on 30 training and testing data set pairs. See the caption of Fig. 1 for the abbreviations of the data types
Fig. 3
Fig. 3
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets using elastic net, I-Boost-CV, and I-Boost-Permutation. Each row represents a particular combination of data types used as predictors, as indicated by the box on the left. Each dot is an average C-index value obtained by performing elastic net, I-Boost-CV, or I-Boost-Permutation on 30 training and testing data set pairs. See the caption of Fig. 1 for the abbreviations of the data types
Fig. 4
Fig. 4
NRI values for the TCGA LUAD, KIRC, and pan-cancer data sets using I-Boost-CV or I-Boost-Permutation. Each dot represents the average NRI between a model with both clinical and genomic variables (estimated by I-Boost-CV or I-Boost-Permutation) and the model with clinical variables only (estimated by maximum partial-likelihood estimation) over 30 training and testing data set pairs
Fig. 5
Fig. 5
NRI values between models containing individual gene expression data and models containing gene modules under the TCGA LUAD, KIRC, and pan-cancer data sets. Each dot represents the average NRI obtained by fitting I-Boost-CV or I-Boost-Permutation on two sets of predictors over 30 training and testing data set pairs. The first set of predictors contains a combination of data types and gene modules; the second set of predictors contains the same combination of data types and individual gene expression data. A positive NRI represents better prediction using the model with gene modules
Fig. 6
Fig. 6
Analysis results for the TCGA LUAD, KIRC, and pan-cancer data sets, using elastic net, I-Boost-CV, and I-Boost-Permutation on nested models. In the left panel, the leftmost dots are fixed at zero, and each remaining dot represents the average NRI obtained by fitting elastic net, I-Boost-CV, or I-Boost-Permutation over 30 training and testing data set pairs. Each dot except the leftmost dots represents the maximum NRI between a model that contains one more data type than the model corresponding to the dot on the left and the model corresponding to the dot on the left. Above each dot, the name of the additional data type is included. In the right panel, the average C-index values and the average numbers of selected variables for the models shown in the left panel are plotted. The arrows indicate the orders of models with respect to the number of data types they contain. See the caption of Fig. 1 for the abbreviations of the data types

Similar articles

Cited by

References

    1. Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822–7. doi: 10.1038/nm.1790. - DOI - PMC - PubMed
    1. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA. 2001;98:11462–7. doi: 10.1073/pnas.201162998. - DOI - PMC - PubMed
    1. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8:816–24. doi: 10.1038/nm733. - DOI - PubMed
    1. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8:68–74. doi: 10.1038/nm0102-68. - DOI - PubMed
    1. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. - DOI - PubMed

Publication types

LinkOut - more resources