Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 23;13(1):17.
doi: 10.3390/cancers13010017.

High-Dimensional Analysis of Single-Cell Flow Cytometry Data Predicts Relapse in Childhood Acute Lymphoblastic Leukaemia

Affiliations

High-Dimensional Analysis of Single-Cell Flow Cytometry Data Predicts Relapse in Childhood Acute Lymphoblastic Leukaemia

Salvador Chulián et al. Cancers (Basel). .

Abstract

Artificial intelligence methods may help in unveiling information that is hidden in high-dimensional oncological data. Flow cytometry studies of haematological malignancies provide quantitative data with the potential to be used for the construction of response biomarkers. Many computational methods from the bioinformatics toolbox can be applied to these data, but they have not been exploited in their full potential in leukaemias, specifically for the case of childhood B-cell Acute Lymphoblastic Leukaemia. In this paper, we analysed flow cytometry data that were obtained at diagnosis from 56 paediatric B-cell Acute Lymphoblastic Leukaemia patients from two local institutions. Our aim was to assess the prognostic potential of immunophenotypical marker expression intensity. We constructed classifiers that are based on the Fisher's Ratio to quantify differences between patients with relapsing and non-relapsing disease. We also correlated this with genetic information. The main result that arises from the data was the association between subexpression of marker CD38 and the probability of relapse.

Keywords: Acute Lymphoblastic Leukaemia; CD38; Fisher’s Ratio; flow cytometry data; mathematical oncology; personalised medicine; response biomarkers.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Preprocessing pipeline of Flow Cytometry Files. Preprocessing was carried out in six steps. The first four were performed in FlowJo and they consisted in the removal of abnormal acquisitions (quality control), margin events, doublets, and debris. The files were then imported into R in step 5 and, for each patient, all tubes or aliquots were merged into a single file by means of nearest-neighbour imputation. Finally, in step 6, the CD19+ population (B cells) was automatically selected for further analysis.
Figure 2
Figure 2
Percentile vector construction. (A) Scatter plot of a patient i for two normalised parameters, j1 = CD10 and j2 = CD20. (B.1) and (B.2). Histograms cell count of, respectively, j1 = CD10 and j2 = CD20. (C.1) and (C.2). Cumulative distribution of markers j1 = CD10 and j2 = CD20, respectively. In red, percentiles curve from 5th to 95th percentile. (D) Each percentile curve for each patient i and marker j results in a vector xijR, where P represents the number of percentiles chosen.
Figure 3
Figure 3
Example of synthetic IPT markers distributions. Mean distribution of a marker with, respectively, (A) high and (B) low Fisher’s Ratio, with (C,D) their respective cumulative distribution of the median ± the standard deviation values. (E) Median cumulative distribution of the two sets of patients for a marker with high Fisher’s Ratio. In solid red line, median cumulative distribution of relapsed patients R¯ and in blue dotted line for the non-relapse ones. In yellow dashed line and green dashed dotted line the median cumulative distribution for the marker v¯i was represented for two different virtual patients i. The distances to each set median, dRi and dNi, are represented with black headed arrows, with dashed lines for Patient 1 and dashed dotted lines for Patient 2. In this example, Patient 1 would be considered to be a relapsed patient, while Patient 2 would belong in the non-relapsed set.
Figure 4
Figure 4
Main steps of the analysis with Fisher’s ratio. In step 1, we compute differences in marker expression between relapsing and non-relapsing patients, comparing the distributions of the most relevant markers. In step 2, we perform k-fold and Leave-One-Out cross-validation (LOOCV), constructing the classifiers with the most relevant markers of the respective train set. In step 3, we analyse the frequency with which markers are employed in 100 classifiers coming from 72:25 splits of the dataset.
Figure 5
Figure 5
Median of immunophenotypic markers for relapse and non-relapse patients. Comparison was performed via t-test. Asterisk denotes markers with p-value lower than 0.05.
Figure 6
Figure 6
Fisher’s Ratio analyses and median cumulative distributions of markers with highest FR. Fisher’s Ratio Matrices for Dataset 1 (A.1), Dataset 2 (B.1), and both datasets combined (C.1). The common parameters within each dataset are represented in the x-axis, while in the y-axis we represent the percentiles of the median cumulative distribution. Colorbars show the intensity of the Fisher’s Ratio for each percentile and marker. Median cumulative distributions and standard deviation bands of the IPT markers with highest FR, for relapsed (red, dotted lines) and non-relapsed (blue, solid lines) patients are represented in the following charts: for Dataset 1, CD38 (A.2) and CD123 (A.3); for Dataset 2, CD38 (B.2) and CD66c (B.3); and, for both datasets combined, CD38 (C.2).
Figure 7
Figure 7
The results of feature importance analysis. (A) Frequency of the markers in all classifiers after 100 simulations of train-test splitting. (B) Histograms of the number of markers after establishing a threshold for the accuracy. (C) Out-of-bag feature importance of the markers after 100 Random Forests. (D) Mean and standard deviation bands of the Out-of-bag Classification Error in Random Forest analysis for the whole set of markers (blue, solid line) and for the set of markers with positive feature importance CD33, CD38 and CD66c (red, dotted line).
Figure 8
Figure 8
Pearson correlation coefficient between clinical, cytogenetic and marker CD38 expression data. Upper triangle shows p-values. Asterisks (*) denote significant correlations (p<0.05).

References

    1. Pizzo P.A., Poplack D.G. Principles and practice of pediatric oncology. Lippincott Williams Wilkins. 2015 doi: 10.1016/j.suronc.2006.05.001. - DOI
    1. Terwilliger T., Abdul-Hay M. Acute lymphoblastic leukemia: A comprehensive review and 2017 update. Blood Cancer J. 2017;7:e577. doi: 10.1038/bcj.2017.53. - DOI - PMC - PubMed
    1. Pui C.-H., Yang J.J., Hunger S.P., Pieters R., Schrappe M., Biondi A., Vora A., Baruchel A., Silverman L.B., Schmiegelow K., et al. Childhood acute lymphoblastic leukemia: Progress through collaboration. J. Clin. Oncol. 2015;33:2938. doi: 10.1200/JCO.2014.59.1636. - DOI - PMC - PubMed
    1. Bhojwani D., Pui C.-H. Relapsed childhood acute lymphoblastic leukaemia. Lancet Oncol. 2013;14:e205–e217. doi: 10.1016/S1470-2045(12)70580-6. - DOI - PubMed
    1. Hunger S.P., Lu X., Devidas M., Camitta B.M., Gaynon P.S., Winick N.J., Reaman G.H., Carroll W.L. Improved survival for children and adolescents with acute lymphoblastic leukemia between 1990 and 2005: A report from the children’s oncology group. J. Clin. Oncol. 2012;30:1663. doi: 10.1200/JCO.2011.37.8018. - DOI - PMC - PubMed