Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 8:1177:338522.
doi: 10.1016/j.aca.2021.338522. Epub 2021 Apr 26.

Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

Affiliations

Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

Leonoor E M Tideman et al. Anal Chim Acta. .

Abstract

The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.

Keywords: Biomarker discovery; Explainable artificial intelligence; Imaging mass spectrometry; Model interpretability; Shapley additive explanations; Supervised machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1.
Fig. 1.
Diagrams of the classifier building and prediction processes in imaging mass spectrometry. Icons from Refs. [21,22].
Fig. 2.
Fig. 2.
Diagram of the classifier interpretation process. SHAP is used to measure the local and global predictive importance of the features that the classifier from Fig. 1 uses to assign the pixels making up the sample surface (and their corresponding mass spectra) to one of four different anatomical classes (cerebral cortex, cerebellum, brainstem, or other). The global SHAP scores provide an experiment-wide measure of each biomarker candidate’s relevance, whereas the local SHAP scores measure the direction and magnitude of each biomarker candidate’s influence on the model output for one single pixel. SHAP maps deliver spatially localized explanations of the classifier’s decision-making process. Icons from Refs. [21,22].
Fig. 3.
Fig. 3.
Microscopy images of the tissue sections imaged in IMS datasets no1 and no2.
Fig. 4.
Fig. 4.
Class-defining masks used as inputs for training the two XGBoost classifiers designed to recognize the mouse-pup brain and liver. For each task, regions of the tissue sample were manually annotated as belonging to one of three categories: dark blue pixels are labeled as belonging to the target organ and make up the positive class, light blue pixels are labeled as not belonging to the target organ and make up the negative class, and gray pixels are close to borders between the target organ and other anatomical structures, making it difficult to annotate them definitively. The latter are therefore excluded from the training data to avoid feeding the supervised machine learning algorithm unreliable annotations during classifier training.
Fig. 5.
Fig. 5.
Mouse-pup brain recognition and global feature ranking.
Fig. 6.
Fig. 6.
Three promising molecular markers for the mouse-pup’s brain. The ion images (left) and SHAP maps (right) of three features (i.e. m/z values) with the most influence on the decision-making process of the classifier trained to recognize the mouse-pup’s brain are shown. The ion images plot the spatial distribution and measured intensity of each feature across the sample, and are not specifically tied to the task of recognizing the brain. The SHAP maps plot the spatial distribution of Shapley values, or local SHAP predictive importance scores, of each feature across the sample, and provide information on where and how the feature is relevant to the task of recognizing brain.
Fig. 7.
Fig. 7.
Masks used as inputs for training the three XGBoost classifiers directed at recognizing the kidney’s inner medulla, outer medulla, and cortex. Different regions of the tissue sample were manually annotated as belonging to one of four categories: light blue pixels belong to the inner medulla, medium blue pixels belong to the outer medulla, dark blue pixels belong to the cortex, and gray pixels are close to borders between these anatomical structures, making it difficult to annotate them definitively. The latter are excluded from the training data to avoid feeding the supervised machine learning algorithm unreliable annotations during classifier training. The black circle outlines a region of the renal cortex that was affected by a sample preparation artefact.
Fig. 8.
Fig. 8.
Renal inner medulla recognition and global feature ranking.
Fig. 9.
Fig. 9.
Three promising molecular markers for the renal inner medulla. The ion images (left) and SHAP maps (right) of three features (i.e. m/z values) with the most influence on the decision-making process of the classifier trained to recognize the rat’s renal inner medulla are shown. The ion images plot the spatial distribution and measured intensity of each feature across the sample, and are not specifically tied to the task of recognizing the inner medulla. The SHAP maps plot the spatial distribution of Shapley values, or local SHAP predictive importance scores, of each feature across the sample, and provide information on where and how the feature is relevant to the task of recognizing the inner medulla.

References

    1. B.D.W. Group, Biomarkers and surrogate endpoints: preferred definitions and conceptual framework, Clin. Pharmacol. Therapeut ISSN: 1532–6535 69 (3) (2001) 89–95, 10.1067/mcp.2001.113989. - DOI - PubMed
    1. Rifai N, Gillette MA, Carr SA, Protein biomarker discovery and validation: the long and uncertain path to clinical utility.”, Nat. Biotechnol ISSN: 1546–1696 24 (8) (Aug. 2006) 971–983, 10.1038/nbt1235. - DOI - PubMed
    1. Crutchfield CA, et al., Advances in mass spectrometry-based clinical biomarker discovery, Clin. Proteonomics ISSN: 1542–6416 13 (Jan. 2016), 10.1186/s12014-015-9102-9. - DOI - PMC - PubMed
    1. Hu Z-Z, et al., Omics-based molecular target and biomarker identification, Clifton, N.J.), Methods Mol. Biol ISSN: 1064–3745 719 (2011) 547–571, 10.1007/978-1-61779-027-0_26. - DOI - PMC - PubMed
    1. Holzlechner M, Eugenin E, Prideaux B, Mass spectrometry imaging to detect lipid biomarkers and disease signatures in cancer, Canc. Rep ISSN: 2573–8348 2 (6) (2019) e1229, 10.1002/cnr2.1229. - DOI - PMC - PubMed

LinkOut - more resources