Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 8;149(1):290-301.
doi: 10.1093/brain/awaf172.

Diagnosing migraine from genome-wide genotype data: a machine learning analysis

Collaborators, Affiliations

Diagnosing migraine from genome-wide genotype data: a machine learning analysis

Antonios Danelakis et al. Brain. .

Abstract

Migraine has an assumed polygenic basis, but the genetic risk variants identified in genome-wide association studies only explain a proportion of the heritability. We aimed to develop machine learning models, capturing non-additive and interactive effects, to address the missing heritability. This was a cross-sectional population-based study of participants in the second and third Trøndelag Health Study. Individuals underwent genome-wide genotyping and were phenotyped based on validated modified criteria of the International Classification of Headache Disorders. Four datasets of increasing numbers of genetic variants were created using different thresholds of linkage disequilibrium and univariate genome-wide associated P-values. A series of machine learning and deep learning methods were optimized and evaluated. The genotype tools PLINK and LDPred2 were used for polygenic risk scoring. Models were trained on a partition of the dataset and tested in a hold-out set. The area under the receiver operating characteristics curve was used as the primary scoring metric. Classification by machine learning was statistically compared to that of polygenic risk scoring. Finally, we explored the biological functions of the variants unique to the machine learning approach. Overall, 43 197 individuals (51% women), with a mean age of 54.6 years, were included in the modelling. A light gradient boosting machine performed best for the three smallest datasets (108, 7771 and 7840 variants), all with hold-out test set area under curve at 0.63. A multinomial naïve Bayes model performed best in the largest dataset (140 467 variants) with a hold-out test set area under curve of 0.62. The models were statistically significantly superior to polygenic risk scoring (area under curve 0.52 to 0.59) for all the datasets (P < 0.001 to P = 0.02). Machine learning identified many of the same genes and pathways identified in genome-wide association studies, but also several unique pathways, mainly related to signal transduction and neurological function. Interestingly, pathways related to botulinum toxins, and pathways related to the calcitonin gene-related peptide receptor also emerged. This study suggests that migraine may follow a non-additive and interactive genetic causal structure, potentially best captured by complex machine learning models. Such structure may be concealed where the data dimensionality (high number of genetic variants) is insufficiently supported by the scale of available data, leaving a misleading impression of purely additive effects. Future machine learning models using substantially larger sample sizes could harness both the additive and the interactive effects, enhancing precision and offering deeper understanding of genetic interactions underlying migraine.

Keywords: HUNT; artificial intelligence; epistasis; genetics; gradient boosting; headache.

PubMed Disclaimer

Conflict of interest statement

A.S. has received lecture honoraria from TEVA. A.S. is a shareholder and patent holder of Nordic Brain Tech AS and the Cerebri app.

Figures

Figure 1
Figure 1
Schematic overview of the study design. Among 43 197 individuals, 10 286 had migraine and 32 911 were headache-free controls. Four different datasets with an increasing number of genetic variants were used for distinguishing migraine versus headache-free controls. These datasets were split in the same 9:1 ratio training and test sets. The training data were subsequently preprocessed, trained and optimized using 10-fold cross-validation. The best model for each dataset was evaluated on the test set.
Figure 2
Figure 2
Performance of the best machine learning models. Receiver operating characteristics curves for top performing machine learning models for each of the four datasets showing mean 10-fold cross-validated area under curve (blue line) ± 1 standard deviation (grey shaded area), and test set area under curve (orange line). (A) Dataset 1 with 108 variants. (B) Dataset 2 with 7771 variants. (C) Dataset 4 with 7840 variants. (D) Dataset 4 with 140 467 variants.
Figure 3
Figure 3
Impact of feature dimensionality. The hold-out test set area under curve (y-axis) is plotted against the number of variants included in the model (y-axis) for the best machine learning and polygenic risk scoring approaches. For each colour, solid lines represent training performance and dotted lines represent test performance for a given modelling approach. Performance for the intermediate datasets (19 473 to 114 179 variants) were only calculated for the best non-linear complex machine learning approach (light gradient boosting) and the best simple additive model (multinomial naïve Bayes) as part of the post hoc sensitivity analyses. Note that light gradient boosting increases in performance up to 93 237 variants before sharply dropping, indicating overfitting when the feature space exceeds a limit. Multinomial naïve Bayes, however, increases steadily before reaching a plateau beyond 57 965 variants. LightGBM = light gradient boosting machine. MNB = multnomial naïve Bayes.
Figure 4
Figure 4
SHAP summary plots. Plots illustrating the relative contribution of the included variants to the predictions for the best machine learning model for each dataset. The x-axes denote number of variants, the y-axes denote the absolute SHAP value on a logarithmic scale. (A) In Dataset 1, all 108 variants contributed towards the prediction. (B) In Dataset 2, 1486 of 7771 variants contributed. (C) In Dataset 3, 1442 of 7840 variants contributed. In the two latter cases, a large majority of variants do not contribute to the prediction suggesting that the model omits the less important variants, however, still achieving higher accuracy than polygenic risk scoring suggesting that some non-additive effects between the contributing variants are captured. (D) In Dataset 4, all 140 467 variants contribute but with small contribution each. This is due to the probabilistic additive architecture of the naive Bayes approach, more similar to polygenic risk scoring. SHAP = Shapley additive explanations.
Figure 5
Figure 5
Venn diagrams showing overlap of annotated genes and enriched pathways. (A) Overlap of annotated genes from variants identified in the genome-wide association study, the best complex model (light gradient boosting machine in Dataset 2) and the best additive machine learning model (multinomial naïve Bayes in Dataset 4). (B) Overlap of enriched pathways from genes and variants identified in the genome-wide association study, the best complex model (light gradient boosting machine in Dataset 2) and the best additive machine learning model (multinomial naïve Bayes in Dataset 4).

References

    1. Stovner LJ, Nichols E, Steiner TJ, et al. Global, regional, and national burden of migraine and tension-type headache, 1990–2016: A systematic analysis for the global burden of disease study 2016. Lancet Neurol. 2018;17:954–976. - PMC - PubMed
    1. Stovner LJ, Hagen K, Linde M, Steiner TJ. The global prevalence of headache: An update, with analysis of the influences of methodological factors on prevalence estimates. J Headache Pain. 2022;23:34. - PMC - PubMed
    1. Steiner T, Stovner L, Jensen R, Uluduz D, Katsarava Z. Migraine remains second among the world’s causes of disability, and first among young women: Findings from GBD2019. J Headache Pain. 2020;21:137–141. - PMC - PubMed
    1. Headache classification committee of the international headache society (IHS) the international classification of headache disorders, 3rd edition. Cephalalgia. 2018;38:1-211. - PubMed
    1. Merikangas KR, Risch NJ, Merikangas JR, Weissman MM, Kidd KK. Migraine and depression: Association and familial transmission. J Psychiatr Res. 1988;22:119–129. - PubMed

Grants and funding