Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 3:15:100390.
doi: 10.1016/j.jpi.2024.100390. eCollection 2024 Dec.

Engineered feature embeddings meet deep learning: A novel strategy to improve bone marrow cell classification and model transparency

Affiliations

Engineered feature embeddings meet deep learning: A novel strategy to improve bone marrow cell classification and model transparency

Jonathan Tarquino et al. J Pathol Inform. .

Abstract

Cytomorphology evaluation of bone marrow cell is the initial step to diagnose different hematological diseases. This assessment is still manually performed by trained specialists, who may be a bottleneck within the clinical process. Deep learning algorithms are a promising approach to automate this bone marrow cell evaluation. These artificial intelligence models have focused on limited cell subtypes, mainly associated to a particular disease, and are frequently presented as black boxes. The herein introduced strategy presents an engineered feature representation, the region-attention embedding, which improves the deep learning classification performance of a cytomorphology with 21 bone marrow cell subtypes. This embedding is built upon a specific organization of cytology features within a squared matrix by distributing them after pre-segmented cell regions, i.e., cytoplasm, nucleus, and whole-cell. This novel cell image representation, aimed to preserve spatial/regional relations, is used as input of the network. Combination of region-attention embedding and deep learning networks (Xception and ResNet50) provides local relevance associated to image regions, adding up interpretable information to the prediction. Additionally, this approach is evaluated in a public database with the largest number of cell subtypes (21) by a thorough evaluation scheme with three iterations of a 3-fold cross-validation, performed in 80% of the images (n = 89,484), and a testing process in an unseen set of images composed by the remaining 20% of the images (n = 22,371). This evaluation process demonstrates the introduced strategy outperforms previously published approaches in an equivalent validation set, with a f1-score of 0.82, and presented competitive results on the unseen data partition with a f1-score of 0.56.

Keywords: Biomedical image processing; Bone marrow cell subtypes; Cytomorphology; Deep learning; Interpretability.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Eduardo Romero reports financial support was provided by Colombia Ministry of Science Technology and Innovation. Jonathan Tarquino reports a relationship with Colombia Ministry of Science Technology and Innovation that includes: funding grants. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Figures

Fig. 1
Fig. 1
BM cell subtype distribution along the 171,374 images within the used dataset (“An Expert-Annotated Dataset of Bone Marrow Cytology in Hematologic Malignancies”46). The blue/gray circle ratio corresponds to the proportion of each class in the dataset. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Region-attention embedding organization scheme. Numeric labels present the order of each feature vector type (shape, color, texture, fractal) after a particular cell component (whole-cell, nucleus, cytoplasm), within the region-attention embedding.
Fig. 3
Fig. 3
DL-prediction interpretability improvement process. This methodology extracted features associated to each cell-region: whole-cell, nucleus, and cytoplasm, and mixed them in a squared matrix with a predefined organization (region-attention embedding). Finally, the strategy identified relevant features per class, and the feature-source cell-region, by using the activation maps provided by Grad cam algorithm.
Fig. 4
Fig. 4
Per-class validation performance of region-attention embedding in combination with Xception network, in terms of precision, recall, and f1-score (more detailed results available in Table A1 within the supplementary material).
Fig. 5
Fig. 5
Interpretable output of the presented combination of region-attention embedding and Xception network. This table presents a match between the most relevant features within the region-attention embedding for five different BM cell subtypes (column 2), and clinically discriminant morphology features (column 3). Images in the right panel graphically show the cell region where the most relevant features are found, in a heat map scale going from highly relevant (HR) region to a non-relevant (NR) region.
Fig. 6
Fig. 6
Feature relevance maps by applying pairwise subtype differentiation based on a Tuckey test, for: (a) nucleus area, (b) cytoplasm convexity, (c) nucleus Haralick dissimilarity, and (d) whole cell Haralick energy. Here, the most significant differences are represented by lower p-values (<0.05), which means the darkest purple matrix points. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 7
Fig. 7
Performance of Xception network by using images and the presented region-attention embedding, with 1000 and 2000 images per-class. Particularly, the bars present accuracy, precision, recall, and f1-score, for Xception network trained with: (a) 1000 RGB images and imageNet weights, (b) 1000 RGB images from scratch, (c) 2000 RGB images and imageNet weights, (d) 2000 RGB images from scratch, (e)1000 image equivalent region-attention embeddings, and (f) 2000 image equivalent region-attention embeddings.

Similar articles

References

    1. Thachil J., Bates I. twelfth ed ed. Elsevier Ltd.; 2017. Approach to the Diagnosis and Classification of Blood Cell Disorders. - DOI
    1. Ladines-Castro W., Barragán-Ibañez G., Luna-Pérez M., et al. Morphology of leukaemias. Revista Médica del Hospital General de México. 2016;79(2):107–113. doi: 10.1016/j.hgmx.2015.06.007. - DOI
    1. Wu Q., Zeng L., Ke H., Xie W., Zheng H., Zhang Y. Medical Imaging 2005: Image Processing. Vol. 5747. SPIE; 2005. Analysis of blood and bone marrow smears using multispectral imaging analysis techniques; pp. 1872–1882.
    1. Tomasian A., Jennings J.W. Bone marrow aspiration and biopsy: techniques and practice implications. Skeletal Radiol. 2022;51(1):81–88. - PubMed
    1. Malempati S., Joshi S., Lai S., Braner D.A., Tegtmeyer K. Bone marrow aspiration and biopsy. N Engl J Med. 2009;361(15):28. - PubMed

LinkOut - more resources