Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Sep 29;21(1):4-11.
doi: 10.1136/practneurol-2020-002688. Online ahead of print.

Big data, machine learning and artificial intelligence: a neurologist's guide

Affiliations
Review

Big data, machine learning and artificial intelligence: a neurologist's guide

Stephen D Auger et al. Pract Neurol. .

Abstract

Modern clinical practice requires the integration and interpretation of ever-expanding volumes of clinical data. There is, therefore, an imperative to develop efficient ways to process and understand these large amounts of data. Neurologists work to understand the function of biological neural networks, but artificial neural networks and other forms of machine learning algorithm are likely to be increasingly encountered in clinical practice. As their use increases, clinicians will need to understand the basic principles and common types of algorithm. We aim to provide a coherent introduction to this jargon-heavy subject and equip neurologists with the tools to understand, critically appraise and apply insights from this burgeoning field.

Keywords: Neuroradiology; clinical neurology; evidence-based neurology; health policy & practice; image analysis.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
(A) Supervised learning algorithm. In the training phase, training data with associated labels are used to create a predictive model. In the testing phase, the predictive model is shown data that are not labelled, in this example to distinguish apples from other items. (B) Unsupervised learning algorithm. Unlabelled input data are processed by the algorithm to see what patterns it identifies, in this example to group red fruit. (C) Reinforcement learning algorithm. An agent performs an action in an environment. This is interpreted into a reward signal and a representation of the state, which are both fed back to the agent. Diagram in (C) reproduced from Wikimedia commons under public domain licence.
Figure 2
Figure 2
Artificial neural networks comprise neurons (circles) that are organised into an input (red), output (green) and any number of hidden (grey) layers. Neurons are connected with different weights of feed-forward connection between them (blue lines). Training the algorithm involves modifying the weights between neurons (represented above by the thickness of blue lines) so that connections associated with a rewarding outcome are strengthened and negative outcomes weakened.
Figure 3
Figure 3
If a supervised learning algorithm has insufficiently diverse training data (eg, only apples which are red), then it is prone to misclassifying items that deviate from those narrow training data (eg, not identifying green apples or incorrectly labelling other red fruit, such as tomatoes, as apples). A more diverse training set, as in figure 1A, would be less prone to this sort of error.
Figure 4
Figure 4
Schematic plots showing underfit (left) and overfit (right) models compared with the best fit (centre). The same data points are plotted (a quadratic function with random noise). The left panel shows a linear model, the right panel shows a high-order polynomial fitted to the data, the centre panel shows the quadratic function which was used to derive the data points. The model on the right panel is highly tuned to random noise in the data, and so is likely to perform poorly at predicting Y values from X values in an independent dataset. The linear model fits these data less ‘tightly’ (ie, there is a higher overall error), and so is less likely to predict Y values based on random noise but may be underfitted in that it does not capture some important structure in the data. The centre panel shows a quadratic function which captures the ‘true’ underlying distribution of the data and so is likely to perform best in an independent dataset.

References

    1. Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature 2016;529:484–9. 10.1038/nature16961 - DOI - PubMed
    1. Silver D, Hubert T, Schrittwieser J, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science (80-) 2018;362:1140 LP–1144. 10.1126/science.aar6404 - DOI - PubMed
    1. Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am J Epidemiol 2017;186:1026–34. 10.1093/aje/kwx246 - DOI - PMC - PubMed
    1. Obermeyer Z, Powers B, Vogeli C, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science (80-) 2019;366:447–53. 10.1126/science.aax2342 - DOI - PubMed
    1. Wang P, Berzin TM, Glissen Brown JR, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut 2019;68:1813–19. 10.1136/gutjnl-2018-317500 - DOI - PMC - PubMed

LinkOut - more resources