Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;155(1):213-218.e4.
doi: 10.1016/j.jaci.2024.09.006. Epub 2024 Sep 13.

Clustering of clinical symptoms using large language models reveals low diagnostic specificity of proposed alternatives to consensus mast cell activation syndrome criteria

Affiliations

Clustering of clinical symptoms using large language models reveals low diagnostic specificity of proposed alternatives to consensus mast cell activation syndrome criteria

Benjamin D Solomon et al. J Allergy Clin Immunol. 2025 Jan.

Abstract

Background: The rate of diagnosis of mast cell activation syndrome (MCAS) has increased since the disorder's original description as a mastocytosis-like phenotype. While a set of consortium MCAS criteria is well described and widely accepted, this increase occurs in the setting of a broader set of proposed alternative MCAS criteria.

Objective: Effective diagnostic criteria must minimize the range of unrelated diagnoses that can be erroneously classified as the condition of interest. We sought to determine if the symptoms associated with alternative MCAS criteria result in less concise or consistent diagnostic alternatives, reducing diagnostic specificity.

Methods: We used multiple large language models, including ChatGPT, Claude, and Gemini, to bootstrap the probabilities of diagnoses that are compatible with consortium or alternative MCAS criteria. We utilized diversity and network analyses to quantify diagnostic precision and specificity compared to control diagnostic criteria including systemic lupus erythematosus, Kawasaki disease, and migraines.

Results: Compared to consortium MCAS criteria, alternative MCAS criteria are associated with more variable (Shannon diversity 5.8 vs 4.6, respectively; P = .004) and less precise (mean Bray-Curtis similarity 0.07 vs 0.19, respectively; P = .004) diagnoses. The diagnosis networks derived from consortium and alternative MCAS criteria had lower between-network similarity compared to the similarity between diagnosis networks derived from 2 distinct systemic lupus erythematosus criteria (cosine similarity 0.55 vs 0.86, respectively; P = .0022).

Conclusion: Alternative MCAS criteria are associated with a distinct set of diagnoses compared to consortium MCAS criteria and have lower diagnostic consistency. This lack of specificity is pronounced in relation to multiple control criteria, raising the concern that alternative criteria could disproportionately contribute to MCAS overdiagnosis, to the exclusion of more appropriate diagnoses.

Keywords: Mast cell; anaphylaxis; artificial intelligence; generative pretrained transformer; large language model; mast cell activation syndrome; mastocytosis; natural language processing.

PubMed Disclaimer

Conflict of interest statement

Disclosure statement Suported by the Bill and Melinda Gates Foundation (OPP1113682 to P.K.); the National Institute of Allergy and Infectious Diseases (NIAID) grants U19AI057229 and U19AI67903; NIAID contract 75N93022C00052; Department of Defense contracts W81XWH1910235 and W911NF2320019; and the National Institute of Child Health and Human Development (grant K12HD000850). Disclosure of potential conflict of interest: The authors declare that they have no relevant conflicts of interest.

Figures

Figure 1:
Figure 1:. California ICD code use for inpatient encounters.
Total count of indicated diagnosis codes per year.
Figure 2:
Figure 2:. Similarity of diagnostic criteria based on symptom word embeddings.
Multiple word embedding models were used to obtain embeddings for all symptoms within each set of criteria. Embeddings were then reduced by PCA. A) Cosine similarity between the PCA-embedding centroids of the indicated pairs of criteria. Colors represent individual embedding models, p-value by Wilcox rank sum test. B) As in A but averaged across all models to show similarity between all pairs of criteria.
Figure 3:
Figure 3:. Diversity and precision of diagnoses associated with diagnostic criteria.
Diagnosis distributions generated by repeated iterations of LLM queries using symptoms from each set of criteria. A-F) Frequencies of the top 10 diagnoses associated with each indicated diagnostic criteria. G) Bray-Curtis similarity between all diagnosis frequencies from each criteria, averaged across all models. H) Average diagnosis frequency in order of rank for the top 50 diagnoses from each set of criteria. I) Shannon diversity for the distribution of all diagnoses from each criteria. J) Precision as represented by the mean Bray-Curtis similarity between all 10,000 differential diagnosis iterations from each criteria and model. For A-F, I-J, colors represent indicated LLM, black bars represent the mean value across all models. For H, grey ribbons represent ± 1 standard error. For I-J, Wilcox rank sum p-values (*) < 0.05, (**) < 0.01, adjusted for multiple comparisons. Only comparisons involving MCAS criteria are annotated.
Figure 4:
Figure 4:. Network of co-occuring diagnoses associated with diagnostic criteria.
A) Partial co-diagnosis graph. Nodes represent diagnoses. Edges are drawn between nodes if a pair is within the top 100 co-occuring diagnoses for the indicated criteria. Only diagnoses found within the top 100 co-occuring diagnoses of at least one set of criteria are drawn as nodes. Nodes colored by standardized eigenvalue centrality among all diagnoses for a given criteria. Red node represents mastocytosis, orange node represents mast cell activation syndrome. B) Mean edge density of complete co-diagnosis graph for each criteria. C) Cosine similarity for the centrality values of all nodes between the complete graphs of the indicated criteria. P-value by Wilcox rank sum test. D) As in C but averaged across all models to show similarity between all criteria networks. For B-C, colors represent indicated LLM, p-values by Wilcox rank sum test, (**) p-value < 0.01, adjusted for multiple comparisons.

Similar articles

References

    1. Sonneck K, Florian S, Müllauer L, Wimazal F, Födinger M, Sperr WR, et al. Diagnostic and Subdiagnostic Accumulation of Mast Cells in the Bone Marrow of Patients with Anaphylaxis: Monoclonal Mast Cell Activation Syndrome. Int Arch Allergy Immunol 2007;142:158–64. - PubMed
    1. Akin C, Scott LM, Kocabas CN, Kushnir-Sukhov N, Brittain E, Noel P, et al. Demonstration of an aberrant mast-cell population with clonal markers in a subset of patients with “idiopathic” anaphylaxis. Blood. 2007;110:2331–3. - PMC - PubMed
    1. Akin C, Valent P, Metcalfe DD. Mast cell activation syndrome: Proposed diagnostic criteria. J Allergy Clin Immunol 2010;126:1099–1104.e4. - PMC - PubMed
    1. Valent P, Akin C, Arock M, Brockow K, Butterfield JH, Carter MC, et al. Definitions, criteria and global classification of mast cell disorders with special reference to mast cell activation syndromes: a consensus proposal. Int Arch Allergy Immunol 2012;157:215–25. - PMC - PubMed
    1. Valent P, Bonadonna P, Hartmann K, Broesby-Olsen S, Brockow K, Butterfield JH, et al. Why the 20% + 2 Tryptase Formula Is a Diagnostic Gold Standard for Severe Systemic Mast Cell Activation and Mast Cell Activation Syndrome. Int Arch Allergy Immunol 2019;180:44–51. - PMC - PubMed