Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 13;14(1):27848.
doi: 10.1038/s41598-024-76093-7.

Machine learning for precision diagnostics of autoimmunity

Affiliations

Machine learning for precision diagnostics of autoimmunity

Jan Kruta et al. Sci Rep. .

Abstract

Early and accurate diagnosis is crucial to prevent disease development and define therapeutic strategies. Due to predominantly unspecific symptoms, diagnosis of autoimmune diseases (AID) is notoriously challenging. Clinical decision support systems (CDSS) are a promising method with the potential to enhance and expedite precise diagnostics by physicians. However, due to the difficulties of integrating and encoding multi-omics data with clinical values, as well as a lack of standardization, such systems are often limited to certain data types. Accordingly, even sophisticated data models fall short when making accurate disease diagnoses and presenting data analyses in a user-friendly form. Therefore, the integration of various data types is not only an opportunity but also a competitive advantage for research and industry. We have developed an integration pipeline to enable the use of machine learning for patient classification based on multi-omics data in combination with clinical values and laboratory results. The application of our framework resulted in up to 96% prediction accuracy of autoimmune diseases with machine learning models. Our results deliver insights into autoimmune disease research and have the potential to be adapted for applications across disease conditions.

Keywords: Autoimmune; Diagnostics; EHR; Machine learning; Multi-omics.

PubMed Disclaimer

Conflict of interest statement

D eclarations Conflict of interest Jan Kruta, Raphael Carapito, Marten Trendelenburg, Thierry Martin, Marta Rizzi, Reinhard E. Voll, Andrea Cavalli, Eriberto Natali, Patrick Meier, Marc Stawiski, Johannes Mosbacher, Annette Mollet, Miriam Capri, Enrico Giampieri, Aurelia Santoro and Erik Schkommodau declare that they have no conflict of interest. Enkelejda Miho owns shares in aiNET GmbH. Ethics approval This study, involving human participants, complies with the Declaration of Helsinki and has received approvals from the appropriate medical ethical committees. The study has been approved by the CHU Strasbourg Medical Ethical Committee under the approval number CPP- IV- Est- 08/02/2011. Participants gave their informed consent prior to their involvement in the study.

Figures

Fig. 1
Fig. 1
Health data integration and machine learning workflow for personalized diagnostics of autoimmune diseases. (A) Clinical, laboratory and omics data were collected, preprocessed and integrated. (B) Data was further transformed and selected features of each data type were extracted, integrated and one-hot encoded. (C) Machine learning methods were applied to analyze and classify autoimmune diseases. The various models were validated and evaluated.
Fig. 2
Fig. 2
Analysis of integrated clinical, laboratory and omics data. (A) Summary statistics of clinical data showed the distribution of clinical data types and identifies non-informative data objects for reduction of data. (B) PCA of cytokine concentrations in AID and non-AID patients. Laboratory data differentiated autoimmune and non-autoimmune patients. However, cytokine concentrations overlapped when subdividing AID into disease types. (C) Immunomics germline gene analysis revealed high frequency of certain combinations of V and J genes across cohorts, where red indicates a high frequency and light blue a low frequency. (D) Top panel: The cumulative degree frequency (CDF) distributions of CDR3 (a.a.) similarities in B-cell repertoires of representative samples of AID patients showed a mixed power-law (orange) and Poisson (gray) distribution in SLE. Bottom panel: power-law and exponential (red) degree distribution in RA. (E) Complexity of genomics data for diagnosis was largely reduced by applying preprocessing additional filtering procedures (see Methods). (F) Concentration of altered metabolites in AID comparing HC (green bar), and arthritis cohorts (red and blue bars). Dark blue indicates a high concentration of metabolites and light blue indicates a low concentration of metabolites. The clustering resulted in distinctive clusters. HC cohort cluster was clearly separated from the remaining cohorts.
Fig. 3
Fig. 3
Encoding of integrated clinical, laboratory and multi-omics data. (A) After binary encoding, the retrieved characteristics exhibited a comparable pattern of normally distributed values “0” in the 5–95th percentile and potentially disease-relevant values (anomalies) “1” in the < 5th and > 95th percentile across all data categories. (B) Clinical data, immunomics, and metabolomics revealed a prevalence of “1” encoded values. Genomics and laboratory data on the other hand were observed to be rather rare across all patients. (C) Although laboratory results within < 5th and > 95th percentile were seldom detected overall within the top 15 features, some of these were among the highly ranked features in samples classified as autoimmune.
Fig. 4
Fig. 4
Machine learning methods applied to integrated data. (A) ROC curve calculations using autoimmune data revealed how different types of data impact the performance of machine learning model with a representative example of a random forest classifier. AUC increased once clinical data was integrated to additional data types such as laboratory and genomics. (B) The performance of selected classifiers was evaluated and compared using stratified cross-validations, revealing an improvement in prediction accuracy for each model when integrated data is used compared to a single data type.

Similar articles

Cited by

References

    1. Sinha, A. A., Lopez, M. T. & McDevitt, H. O. Autoimmune diseases: the failure of Self Tolerance. Science. 248, 1380–1388 (1990). - PubMed
    1. Cooper, G. S., Bynum, M. L. K. & Somers, E. C. Recent insights in the epidemiology of autoimmune diseases: improved prevalence estimates and understanding of clustering of diseases. J. Autoimmun. 33, 197–207 (2009). - PMC - PubMed
    1. Living with the enemy. Nat. Immunol. 19, 658–658 (2018). - PubMed
    1. Dinse, G. E. et al. Increasing prevalence of Antinuclear Antibodies in the United States. Arthritis Rheumatol. 72, 1026–1035 (2020). - PMC - PubMed
    1. Greiner, W. et al. High-expenditure disease in the EU-28: does drug spend correspond to clinical and economic Burden in Oncology, Autoimmune Disease and Diabetes? PharmacoEconomics - Open. 5, 385–396 (2021). - PMC - PubMed

LinkOut - more resources