Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 9;15(1):145.
doi: 10.1186/s13023-020-01424-6.

The use of machine learning in rare diseases: a scoping review

Affiliations

The use of machine learning in rare diseases: a scoping review

Julia Schaefer et al. Orphanet J Rare Dis. .

Abstract

Background: Emerging machine learning technologies are beginning to transform medicine and healthcare and could also improve the diagnosis and treatment of rare diseases. Currently, there are no systematic reviews that investigate, from a general perspective, how machine learning is used in a rare disease context. This scoping review aims to address this gap and explores the use of machine learning in rare diseases, investigating, for example, in which rare diseases machine learning is applied, which types of algorithms and input data are used or which medical applications (e.g., diagnosis, prognosis or treatment) are studied.

Methods: Using a complex search string including generic search terms and 381 individual disease names, studies from the past 10 years (2010-2019) that applied machine learning in a rare disease context were identified on PubMed. To systematically map the research activity, eligible studies were categorized along different dimensions (e.g., rare disease group, type of algorithm, input data), and the number of studies within these categories was analyzed.

Results: Two hundred eleven studies from 32 countries investigating 74 different rare diseases were identified. Diseases with a higher prevalence appeared more often in the studies than diseases with a lower prevalence. Moreover, some rare disease groups were investigated more frequently than to be expected (e.g., rare neurologic diseases and rare systemic or rheumatologic diseases), others less frequently (e.g., rare inborn errors of metabolism and rare skin diseases). Ensemble methods (36.0%), support vector machines (32.2%) and artificial neural networks (31.8%) were the algorithms most commonly applied in the studies. Only a small proportion of studies evaluated their algorithms on an external data set (11.8%) or against a human expert (2.4%). As input data, images (32.2%), demographic data (27.0%) and "omics" data (26.5%) were used most frequently. Most studies used machine learning for diagnosis (40.8%) or prognosis (38.4%) whereas studies aiming to improve treatment were relatively scarce (4.7%). Patient numbers in the studies were small, typically ranging from 20 to 99 (35.5%).

Conclusion: Our review provides an overview of the use of machine learning in rare diseases. Mapping the current research activity, it can guide future work and help to facilitate the successful application of machine learning in rare diseases.

Keywords: Machine learning; Rare diseases; Scoping review.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Selection of sources of evidence
Fig. 2
Fig. 2
World map showing publications by country (a); countries with more than five publications (b); total number of publications per year (c; for comparison, the inset shows the publication trend for machine learning in general)
Fig. 3
Fig. 3
Distribution across disease groups: The distribution of the 381 diseases included in the literature search is shown in comparison with the distribution of the 74 diseases investigated in the studies (left; disease groups smaller than 3% are not shown); differences between the percentages show disease groups that are over- or underrepresented in the studies (right)
Fig. 4
Fig. 4
Types of algorithms used in the studies (a); input data (b); medical application (c); number of patients (d). Studies using more than one type of algorithm or input data are listed in more than one category

References

    1. European Commission. https://ec.europa.eu/info/research-and-innovation/research-area/health-r.... Accessed 16 Apr 2020.
    1. EURORDIS. https://www.eurordis.org/about-rare-diseases. Accessed 16 Apr 2020.
    1. Wakap SN, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–173.. - PMC - PubMed
    1. Shire, Rare Disease Impact Report. https://globalgenes.org/wp-content/uploads/2013/04/ShireReport-1.pdf. Accessed 16 Apr 2020.
    1. Orphanet. http://www.orpha.net. Accessed 16 Apr 2020.

Publication types