Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2022 Mar 19;24(1):71.
doi: 10.1186/s13075-022-02752-7.

The differential diagnosis of IgG4-related disease based on machine learning

Affiliations
Multicenter Study

The differential diagnosis of IgG4-related disease based on machine learning

Motohisa Yamamoto et al. Arthritis Res Ther. .

Abstract

Introduction: To eliminate the disparity and maldistribution of physicians and medical specialty services, the development of diagnostic support for rare diseases using artificial intelligence is being promoted. Immunoglobulin G4 (IgG4)-related disease (IgG4-RD) is a rare disorder often requiring special knowledge and experience to diagnose. In this study, we investigated the possibility of differential diagnosis of IgG4-RD based on basic patient characteristics and blood test findings using machine learning.

Methods: Six hundred and two patients with IgG4-RD and 204 patients with non-IgG4-RD that needed to be differentiated who visited the participating institutions were included in the study. Ten percent of the subjects were randomly excluded as a validation sample. Among the remaining cases, 80% were used as training samples, and the remaining 20% were used as test samples. Finally, validation was performed on the validation sample. The analysis was performed using a decision tree and a random forest model. Furthermore, a comparison was made between conditions with and without the serum IgG4 concentration. Accuracy was evaluated using the area under the receiver-operating characteristic (AUROC) curve.

Results: In diagnosing IgG4-RD, the AUROC curve values of the decision tree and the random forest method were 0.906 and 0.974, respectively, when serum IgG4 levels were included in the analysis. Excluding serum IgG4 levels, the AUROC curve value of the analysis by the random forest method was 0.925.

Conclusion: Based on machine learning in a multicenter collaboration, with or without serum IgG4 data, basic patient characteristics and blood test findings alone were sufficient to differentiate IgG4-RD from non-IgG4-RD.

Keywords: Artificial intelligence; Differential diagnosis; IgG4-related disease; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Prediction of IgG4-RD diagnosis in patients with rheumatic diseases requiring differentiation by a CART, when the serum IgG4 level was known. A Decision tree algorithm. The blue color in the figure indicates the predicted percentage of IgG4-RD cases, and the red color indicates the percentage of non-IgG4-RD cases. The CART tree model revealed that the key process fluctuations leading to the diagnosis of IgG4-RD in this process were serum levels of IgG4, CRP, IgM, sIL-2R, C3, lymphocyte, and IgG. Furthermore, from top to bottom along the branch to each leaf node of the tree, the “if-then” rules could be generated to predict the diagnosis. For example, the right branch of the CART tree indicated that if serum IgG4 level was ≥151.5 mg/dL, CRP was <5 mg/dL, and IgM was <177.5 mg/dL, it was shown that IgG4-RD is significantly more likely than non-IgG4-RD. B ROC curve in the decision tree algorithm (left). The accuracy, sensitivity, and specificity of the algorithm were 0.917, 0.963, and 0.789, respectively, and the AUC was 0.889. C ROC curve for the decision tree algorithm (validation) (right). The validation of this algorithm showed that its accuracy, sensitivity, and specificity were 0.906, 0.983, and 0.714, respectively, and the AUC was 0.906
Fig. 2
Fig. 2
Prediction of IgG4-RD diagnosis in patients with rheumatic diseases requiring differentiation by a random forest, when the serum IgG4 level was known. A Decrease in Gini impurity. In this algorithm, the serum IgG4 concentration is the most important variable, followed by the age at the first visit, levels of serum IgA, sIL-2R, and IgM. B ROC curve for the random forest algorithm (left). The accuracy, sensitivity, and specificity of the algorithm were 0.938, 0.981, and 0.816, respectively, and the AUC was 0.986. C ROC curve for the random forest algorithm (validation) (right). The validation of this algorithm showed that its accuracy, sensitivity, and specificity were 0.938, 1.000, and 0.762, respectively, and the AUC was 0.974
Fig. 3
Fig. 3
Prediction of IgG4-RD diagnosis in patients with rheumatic diseases requiring differentiation by a CART, when the serum IgG4 level was unknown. A Decision tree algorithm. The blue color in the figure indicates the predicted percentage of IgG4-RD cases, and the red color indicates the percentage of non-IgG4-RD cases. The CART tree model revealed that the key process fluctuations leading to the diagnosis of IgG4-RD in this process were the age at the first visit, several serum biomarkers, and the peripheral counts of white blood cells and its fractions. For example, the right branch of the CART tree indicated that if age at the first visit ≥51.5 years, serum IgM level was <201 mg/dL, peripheral counts of leucocytes <10,960/μL, serum IgG level was ≥1,253.5 mg/dL, and serum IgA level was <289.5 mg/dL, it was shown that IgG4-RD is significantly more likely than non-IgG4-RD. B ROC curve for the decision tree algorithm (left). The accuracy, sensitivity, and specificity of the algorithm were 0.807, 0.869, and 0.632, respectively, and the AUC was 0.776. C ROC curve for the decision tree algorithm (validation) (right). The validation of this algorithm showed that its accuracy, sensitivity, and specificity were 0.852, 0.917, and 0.667, respectively, and the AUC was 0.763
Fig. 4
Fig. 4
Prediction of IgG4-RD diagnosis in patients with rheumatic diseases requiring differentiation by a random forest, when the serum IgG4 level was unknown. A Decrease in Gini impurity. In the Random Forest method, the Gini impurity is an indicator of the importance of a variable. In this algorithm, the age at the first visit is the most important variable, followed by levels of serum IgA, sIL-2R, IgM, and IgE. B ROC curve for the random forest algorithm (left). The accuracy, sensitivity, and specificity of the algorithm were 0.897, 0.972, and 0.684, respectively, and the AUC was 0.955. C ROC curve for the random forest algorithm (validation) (right). The validation of this algorithm showed that its accuracy, sensitivity, and specificity were 0.877, 1.000, and 0.524, respectively, and the AUC was 0.925

References

    1. Yamamoto M, Takahashi H, Shinomura Y. Mechanisms and assessment of IgG4-related disease: lessons for the rheumatologist. Nat Rev Rheumatol. 2014;10:148–159. doi: 10.1038/nrrheum.2013.183. - DOI - PubMed
    1. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358. doi: 10.1056/NEJMra1814259. - DOI - PubMed
    1. Kim KJ, Kim M, Adamopoulos IE, Tagkopoulos I. Compendium of synovial signatures identifies pathologic characteristics for predicting treatment response in rheumatoid arthritis patients. Clin Immunol. 2019;202:1–10. doi: 10.1016/j.clim.2019.03.002. - DOI - PMC - PubMed
    1. Guan Y, Zhang H, Quang D, Wang Z, Parker SCJ, Pappas DA, et al. Machine learning to predict anti-tumor necrosis factor responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. 2019;71:1987–1996. doi: 10.1002/art.41056. - DOI - PubMed
    1. Umehara H, Okazaki K, Kawa S, Takahashi H, Goto H, Matsui S, et al. The 2020 revised comprehensive diagnostic (RCD) criteria for IgG4-RD. Mod Rheumatol. 2021;31:529–533. doi: 10.1080/14397595.2020.1859710. - DOI - PubMed

Publication types

Substances