Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Dec;26(6):733-752.
doi: 10.1002/epd2.20288. Epub 2024 Oct 24.

Big data research is everyone's research-Making epilepsy data science accessible to the global community: Report of the ILAE big data commission

Affiliations
Review

Big data research is everyone's research-Making epilepsy data science accessible to the global community: Report of the ILAE big data commission

Colin B Josephson et al. Epileptic Disord. 2024 Dec.

Abstract

Epilepsy care generates multiple sources of high-dimensional data, including clinical, imaging, electroencephalographic, genomic, and neuropsychological information, that are collected routinely to establish the diagnosis and guide management. Thanks to high-performance computing, sophisticated graphics processing units, and advanced analytics, we are now on the cusp of being able to use these data to significantly improve individualized care for people with epilepsy. Despite this, many clinicians, health care providers, and people with epilepsy are apprehensive about implementing Big Data and accompanying technologies such as artificial intelligence (AI). Practical, ethical, privacy, and climate issues represent real and enduring concerns that have yet to be completely resolved. Similarly, Big Data and AI-related biases have the potential to exacerbate local and global disparities. These are highly germane concerns to the field of epilepsy, given its high burden in developing nations and areas of socioeconomic deprivation. This educational paper from the International League Against Epilepsy's (ILAE) Big Data Commission aims to help clinicians caring for people with epilepsy become familiar with how Big Data is collected and processed, how they are applied to studies using AI, and outline the immense potential positive impact Big Data can have on diagnosis and management.

Keywords: artificial intelligence; big data; common data models; epilepsy; ethics.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Manifold sources of clinical data now exist, with inherent trade‐offs between population size, phenotypic depth, ease of access, and complexity of database infrastructure and relationships. Reproduced from. AF, atrial fibrillation; AFGen, AF Consortium; CHD, coronary heart disease; eMERGE, Electronic Medical Records and Genomics; EPIC, European Prospective Investigation into Cancer and Nutrition; ERFC, Emerging Risk Factors Collaboration; ESC, European Society of Cardiology; HF, heart failure; MVP, Million Veterans Programme; NICOR, National Institute for Cardiovascular Outcomes Research; NIHR, National Institute for Health Research; PMI, precision medicine initiative; RPGEH, Research Programme on Genes, Environment, and Health; UCLEB, University College, London School of Hygiene and Tropical Medicine, Edinburgh, Bristol.
FIGURE 2
FIGURE 2
The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) can accommodate both administrative claims and electronic health records, allowing users to generate evidence from a wide variety of sources. It can also support collaborative research across data sources both within and outside the United States, in addition to being manageable for data owners and useful for data users. Once a database has been converted into the OMOP CDM, evidence can be generated using standardized analytics tools.
FIGURE 3
FIGURE 3
An overview of typical EEG processing and analysis. (A) Recorded raw data are preprocessed to reduce artifacts (e.g., power line noise, EMG activity, eye blinks, etc.), exclude noisy channels and segments and divide the data into short epochs. (B) Epochs are averaged in the time or frequency domain. Alternatively, epochs are first converted to time–frequency reconstructions before averaging. (C) Data are analyzed in sensor (electrode) space, that is, time series of the different channels to evaluate topography and extract any features of interest. Alternatively, the data are submitted to source imaging to project the electrode time series into source space before evaluation of spatial distribution and feature extraction. (D) Features of interest are then evaluated on an individual level, for example, for clinical applications, or analyzed using group level statistics and machine learning approaches.
FIGURE 4
FIGURE 4
An overview of typical MRI processing and analysis. (A) Raw MRI data are preprocessed to reduce artifacts (e.g., field inhomogeneity) and spatially normalized to standard space. (B) Skull is then removed from the image volume, followed by tissue segmentation which allows for separation of brain tissue to GM, WM and CSF. Pial or GW surfaces can also be generated. (C) Feature extraction based on the MR images can be performed on voxel level or surface level. Common features include GW boundary blurring, gyral shape, GM thickness, sulcal depth, interhemispheric asymmetry measures, volume, etc. The choice of feature depends on specific goals of each study. (D) Features of interest can then be statistically evaluated to identify group‐level differences or perform individual‐level predictions.
FIGURE 5
FIGURE 5
Phenotype generation is critical to accurate and reliable Big Data studies employing electronic or administrative health records. A systematic approach involves first exploring the data to find relevant codes and concepts, then using these data to develop and test algorithms meant to isolate people with the disease of interest. The algorithms can then be deployed to test their accuracy using real world data. Finally, once refined and optimized, these algorithms can be stored in centralized repositories where they can be readily deployed for research and surveillance purposes. Reproduced from 49. CPRD represents the Clinical Practice Research Data link; HES represents Hospital Episode Statistics; MINAP is the Myocardial Ischaemia National Audit Registry; ONS is the UK Office of National Statistics (mortality and social deprivation data).
FIGURE 6
FIGURE 6
The general approach to generate machine learning and artificial intelligence models. Data can be obtained from a single or multiple datasets. The data are then extracted, normalized, and processed to permit feature (independent variable) identification. These features may need to be engineered from the raw data. Feature and dimensionality reduction is then performed to reduce the risks of overfitting and modeling in sparse dimensional space. The final analytics dataset is then divided into training (where the model is derived) and testing (where the model is tested, and performance metrics are derived). If the discrimination, calibration, and overall performance metrics are suboptimal, then attempts can be made to refine the model by re‐engineering and re‐selecting features. Once the final model is derived and decided upon, it should then undergo external validation in an independent population to ensure performance is generalizable. If it is, and still performs well, then further impact assessments and decision analyses are required to ensure it truly enhances care, is feasible in real world settings, and does not adversely affect health systems. Following this, if these measures are met, the model can finally be deployed for use in care and disease monitoring, and the algorithm can be perpetually refined in a continual learning environment as it receives novel real‐world data in electronic health records systems.

Similar articles

References

    1. Mehta N, Pandit A. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform. 2018;114:57–65. - PubMed
    1. Scruggs SB, Watson K, Su AI, Hermjakob H, Yates JR III, Lindsey ML, et al. Harnessing the heart of big data. Circ Res. 2015;116:1115–1119. - PMC - PubMed
    1. Scheffer IE, Berkovic S, Capovilla G, Connolly MB, French J, Guilhoto L, et al. ILAE classification of the epilepsies: position paper of the ILAE Commission for Classification and Terminology. Epilepsia. 2017;58:512–521. - PMC - PubMed
    1. Brinkmann BH, Bower MR, Stengel KA, Worrell GA, Stead M. Large‐scale electrophysiology: acquisition, compression, encryption, and storage of big data. J Neurosci Methods. 2009;180:185–192. - PMC - PubMed
    1. Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. In: Bohr A, Memarzadeh K, editors. Artificial intelligence in healthcare. London, UK: Academic Press is an imprint of Elsevier; 2020. p. 25–60.