Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 22:4:61.
doi: 10.1186/1755-8794-4-61.

Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors

Affiliations

Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors

Heather G LaBreche et al. BMC Med Genomics. .

Abstract

Background: Transgenic mouse tumor models have the advantage of facilitating controlled in vivo oncogenic perturbations in a common genetic background. This provides an idealized context for generating transcriptome-based diagnostic models while minimizing the inherent noisiness of high-throughput technologies. However, the question remains whether models developed in such a setting are suitable prototypes for useful human diagnostics. We show that latent factor modeling of the peripheral blood transcriptome in a mouse model of breast cancer provides the basis for using computational methods to link a mouse model to a prototype human diagnostic based on a common underlying biological response to the presence of a tumor.

Methods: We used gene expression data from mouse peripheral blood cell (PBC) samples to identify significantly differentially expressed genes using supervised classification and sparse ANOVA. We employed these transcriptome data as the starting point for developing a breast tumor predictor from human peripheral blood mononuclear cells (PBMCs) by using a factor modeling approach.

Results: The predictor distinguished breast cancer patients from healthy individuals in a cohort of patients independent from that used to build the factors and train the model with 89% sensitivity, 100% specificity and an area under the curve (AUC) of 0.97 using Youden's J-statistic to objectively select the model's classification threshold. Both permutation testing of the model and evaluating the model strategy by swapping the training and validation sets highlight its stability.

Conclusions: We describe a human breast tumor predictor based on the gene expression of mouse PBCs. This strategy overcomes many of the limitations of earlier studies by using the model system to reduce noise and identify transcripts associated with the presence of a breast tumor over other potentially confounding factors. Our results serve as a proof-of-concept for using an animal model to develop a blood-based diagnostic, and it establishes an experimental framework for identifying predictors of solid tumors, not only in the context of breast cancer, but also in other types of cancer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental Design. To generate a human breast tumor predictor we took the following steps: (1) randomly divided samples into training and validation cohorts, then analyzed gene expression in PBC samples collected from tumor-bearing transgenic mice versus tumor-free controls that are matched for age and parity; (2) applied a sparse ANOVA to the training set to identify 4,276 mouse Affymetrix probes with at least 0.99 posterior probability (these probes were used to independently generate a set of 49 factors that were used to develop a predictive model based on the mouse gene expression data); (3) translated these into 2,595 orthologous human probes; (4) applied BFRM to this subset of 2,595 probes to yield 26 factors from a training set of 30 human PBMC samples; (5) used SSS to build 5000 possible predictive models from the training set, then projected these into a separate validation set of samples.
Figure 2
Figure 2
Generation and validation of the mouse mammary tumor signature. We generated a mouse mammary tumor predictor based on gene expression of PBCs from a training set of tumor-bearing transgenic mice (n = 32) and nontransgenic tumor-free mice (n = 14). (A) This signature is capable of distinguishing the two classes accurately within the traning data set as shown in the model fit diagram. Blue = healthy tumor-free control mice; red = MMTV/c-MYC tumor-bearing mice. (B) Furthermore, this signature was applied to an independent set of PBC samples from transgenic tumor mice (n = 33) and nontransgenic controls (n = 14) to predict the tumor status of each sample. It demonstrated 100% sensitivity and 100% specificity in predicting tumor status, using the optimal threshold of 0.8118 as calculated using Youden's J-statistic.
Figure 3
Figure 3
Gene expression signature predicts human breast cancer. We generated a human breast cancer predictor from human PBMC samples by using a mouse model of breast cancer to first identify the most informative probes to use in a subsequent factor modeling approach. (A) We generated the predictive model from a training set of healthy cancer-free individuals (n = 10) and patients with invasive breast cancer (n = 10). Blue = normal; red = malignant. An assessment of the model fit shows that this predictor has a robust capacity to discriminate among samples based on breast tumor status with 100% sensitivity and specificity. (B) We then used this predictive model to evaluate an independent set of samples (n = 162) for the capacity to distinguish controls from patients with a diagnosis of malignant breast cancer. This represents an external validation using samples not used in either the factor generation or the model building process. (C) We were able to predict breast cancer status with a sensitivity of 89% and specificity of 100% (AUC = 0.97) as shown in the ROC curve. The optimal threshold was calculated as 0.3760 based on Youden's J-statistic. (D) We then tested the validity of our modeling strategy by swapping the training and validation sets. New factors were generated based on the original validation set and new models were generated. The model fit diagram shows the ability to generate a robust model from the original validation set. (E) This new model was validated in the original training set. (F) It demonstrated a sensitivity of 100% and a specificity of 90% (AUC = 0.98). As a negative control, we generated mock factors from a publicly available dataset that was biologically unrelated to breast tumor status. We then projected these factors into the training (G) and validation (H) sets. (I) The sensitivity was 83% and specificity was 48% (AUC = 0.63).
Figure 4
Figure 4
Inclusion probabilities of factors in the predictive model. We generated a collection of 26 factors from the human PBMC training set using the methods described previously and used SSS to put these together in various combinations to form predictive models (5000 iterations), which were validated in an independent sample set. We calculated the top performing factors based on their inclusion probability (median posterior marginal probability) in the top 200 models. These 26 factors are plotted along the x-axis and the median posterior marginal probabilities are plotted on the y-axis. The top 3 factors with inclusion probabilities significantly above the background noise are 3, 12 and 14.

Similar articles

Cited by

References

    1. Aaroe J, Lindahl T, Dumeaux V, Saebo S, Tobin D, Hagen N, Skaane P, Lonneborg A, Sharma P, Borresen-Dale AL. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12:R7. doi: 10.1186/bcr2472. - DOI - PMC - PubMed
    1. Han M, Liew CT, Zhang HW, Chao S, Zheng R, Yip KT, Song ZY, Li HM, Geng XP, Zhu LX. et al.Novel blood-based, five-gene biomarker set for the detection of colorectal cancer. Clin Cancer Res. 2008;14:455–460. doi: 10.1158/1078-0432.CCR-07-1801. - DOI - PubMed
    1. Osman I, Bajorin DF, Sun TT, Zhong H, Douglas D, Scattergood J, Zheng R, Han M, Marshall KW, Liew CC. Novel blood biomarkers of human urinary bladder cancer. Clin Cancer Res. 2006;12:3374–3380. doi: 10.1158/1078-0432.CCR-05-2081. - DOI - PubMed
    1. Sharma P, Sahni NS, Tibshirani R, Skaane P, Urdal P, Berghagen H, Jensen M, Kristiansen L, Moen C, Zaka A. et al.Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Res. 2005;7:R634–644. doi: 10.1186/bcr1203. - DOI - PMC - PubMed
    1. Showe MK, Vachani A, Kossenkov AV, Yousef M, Nichols C, Nikonova EV, Chang C, Kucharczuk J, Tran B, Wakeam E. et al.Gene expression profiles in peripheral blood mononuclear cells can distinguish patients with non-small cell lung cancer from patients with nonmalignant lung disease. Cancer Res. 2009;69:9202–9210. doi: 10.1158/0008-5472.CAN-09-1378. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances