Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 15:16:293.
doi: 10.1186/s12859-015-0722-x.

A methodology for exploring biomarker--phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations

Affiliations

A methodology for exploring biomarker--phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations

Hongtai Huang et al. BMC Bioinformatics. .

Abstract

Background: This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects. The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools.

Results: Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82% correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set).

Conclusions: IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
CART Result. CART output showing splitting variables and their respective values
Fig. 2
Fig. 2
GSEA Schematic. Hybrid Gene Set Enrichment Analysis modeling approach. Conditional Random Forests is first run to generate a Variable Importance List. That list, along with a ranked list of FC variables (ranked by correlation with the phenotype) constitute the input to GSEA. FC sets are created top-down from the Variables Importance List
Fig. 3
Fig. 3
ROC results for CRF, RF, SVM and CART. Receiver operating Characteristic Curve results for Conditional Random Forests, Random Forests, Support Vector Machines and Classification and Regression Trees
Fig. 4
Fig. 4
Example random walk. Gene Set Enrichment Analysis random walk for a flow cytometry set comprised of 27 variables
Fig. 5
Fig. 5
Plot of enrichment score versus FC set size. Gene Set Enrichment Analysis enrichment score plotted as a function of FC set size (obtained top-down from the CRF Variable Importance List)
Fig. 6
Fig. 6
Statistical significance levels of the ES values shown in Fig. 8. Statistical significance levels for the enrichment scores shown in Fig. 8
Fig. 7
Fig. 7
Random walk for the FC set comprised of the 20th to 30th highest ranked variables. Random walk for the FC set comprised of the 20th to 30th highest ranked variables in the Conditional Random Forest variable importance list
Fig. 8
Fig. 8
Random walk for the FC set comprised of the bottom ten ranked variables. Random walk for the FC set comprised of the bottom 10 ranked variables in the Conditional Random Forest variable importance list
Fig. 9
Fig. 9
Random walk for the CD4 FC set. Random walk for the CD4 FC set, comprised of possibly important markers
Fig. 10
Fig. 10
Statistical significance of the ES vales in Fig. 13. Statistical significance of the enrichment scores for the CD4 GSEA results
Fig. 11
Fig. 11
CRF, RF, CART, SVM Predictive Performance – ROC. CRF and the other machine learning methods have poor predictive performance
Fig. 12
Fig. 12
CART Pre-partitioning result. Classification and Regression Trees pre-partitioning result. Patients are divided into groups, with screening tools identified for each group (as opposed to finding best screening tools for the entire set of training patients)
Fig. 13
Fig. 13
Screening tool performance: Training versus validation error Screening tool training and validation performance showing that over fitting in training was occurring. The best training screening tools are not the best performing validation screening tools

Similar articles

Cited by

References

    1. Winstone TA, Assayag D, Wilcox PG, Dunne JV, Hague CJ, Leipsic J, Collard HR, CJ. Ryerson: Predictors of mortality and progression in scleroderma-associated interstitial lung disease: A systematic review. Chest, 2014 - PubMed
    1. Varga J. http://www.uptodate.com/contents/prognosis-and-treatment-of-interstitial.... 2014.
    1. Wahren-Herlenius M, Dorner T. Immunopathogenic mechanisms of systemic autoimmune disease. Lancet. 2013;382(9894):819–31. doi: 10.1016/S0140-6736(13)60954-X. - DOI - PubMed
    1. Roth MD, Tseng CH, Clements PJ, Furst DE, Tashkin DP, Goldin JG, Khanna D, Kleerup EC, Li N, Elashoff D, Elashoff RM, G. Scleroderma Lung Study Research Predicting treatment outcomes and responder subsets in scleroderma-related interstitial lung disease. Arthritis Rheum. 2011;63(9):2797–808. doi: 10.1002/art.30438. - DOI - PMC - PubMed
    1. Strange C, Seibold JR. Scleroderma lung disease: If you don’t know where you are going, any road will take you there. Am J Respir Crit Care Med. 2008;177:1178–9. doi: 10.1164/rccm.200802-304ED. - DOI - PubMed

Publication types