. 2022 Dec 1:264:119729.

doi: 10.1016/j.neuroimage.2022.119729. Epub 2022 Nov 4.

BLMM: Parallelised computing for big linear mixed models

Thomas Maullin-Sapey¹, Thomas E Nichols²

Affiliations

¹ Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK. Electronic address: thomas.maullin-sapey@bdi.ox.ac.uk.
² Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK.

PMID: 36336314
PMCID: PMC10985650
DOI: 10.1016/j.neuroimage.2022.119729

BLMM: Parallelised computing for big linear mixed models

Thomas Maullin-Sapey et al. Neuroimage. 2022.

. 2022 Dec 1:264:119729.

doi: 10.1016/j.neuroimage.2022.119729. Epub 2022 Nov 4.

Authors

Thomas Maullin-Sapey¹, Thomas E Nichols²

Affiliations

¹ Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK. Electronic address: thomas.maullin-sapey@bdi.ox.ac.uk.
² Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK.

PMID: 36336314
PMCID: PMC10985650
DOI: 10.1016/j.neuroimage.2022.119729

Abstract

Within neuroimaging large-scale, shared datasets are becoming increasingly commonplace, challenging existing tools both in terms of overall scale and complexity of the study designs. As sample sizes grow, researchers are presented with new opportunities to detect and account for grouping factors and covariance structure present in large experimental designs. In particular, standard linear model methods cannot account for the covariance and grouping structures present in large datasets, and the existing linear mixed models (LMM) tools are neither scalable nor exploit the computational speed-ups afforded by vectorisation of computations over voxels. Further, nearly all existing tools for imaging (fixed or mixed effect) do not account for variability in the patterns of missing data near cortical boundaries and the edge of the brain, and instead omit any voxels with any missing data. Yet in the large-n setting, such a voxel-wise deletion missing data strategy leads to severe shrinkage of the final analysis mask. To counter these issues, we describe the "Big" Linear Mixed Models (BLMM) toolbox, an efficient Python package for large-scale fMRI LMM analyses. BLMM is designed for use on high performance computing clusters and utilizes a Fisher Scoring procedure made possible by derivations for the LMM Fisher information matrix and score vectors derived in our previous work, Maullin-Sapey and Nichols (2021).

PubMed Disclaimer

Conflict of interest statement

Not applicable.

Figures

**Fig. 1**
Activity diagram detailing the BLMM pipeline. The boundary of the BLMM code is indicated by the gray outline. The start and end nodes of the pipeline are represented by the black circle and nested black and white circles, respectively. Decision nodes are represented by diamonds and parallel stages of computation are represented with vertical bars. Also included are dotted lines indicating the distinct “stages” of the BLMM pipeline. Of particular note are the image-wise and voxel-wise batching stages of the pipeline, in which computation is parallelised across $B_{I}$ groups of images and $B_{v}$ groups of voxels, respectively.

**Algorithm 1**
Product Form Computation Pseudocode.

**Algorithm 2**
Full Simplified Fisher Scoring Pseudocode.

**Fig. 2**
A visual representation of the pipeline employed to generate the simulated data of Section 3.1. The first box depicts the model which was used for data generation, notably highlighting that $ϵ$ and $b$ varied across space. The second box details the smoothing process, with the $\otimes$ symbol representing convolution in this instance. The third box details the masking stage, with $⊙$ representing the Hadamard (element-wise) product.

**Fig. 3**
Observed serial computation times for each experimental design, displayed as a function of the number of observations, $n$ . Displayed are the SCT in kiloseconds for BLMM (dashed) and lmer (dotted).

**Fig. 4**
First row: The MNI152 2mm anatomical template, for reference. Second row: $χ^{2}$ statistics for comparison of model 1 and model 2, displayed on the square root scale; outlined in black are regions where evidence was found that inclusion of a random subject intercept significantly affected (at the $5 %$ Bonferroni-significance level) the results of the analysis. Third row: $χ^{2}$ statistics for comparison of model 2 and model 3, displayed on the square root scale; outlined regions indicate where the inclusion of a random subject slope significantly affected the results of the analysis. Fourth row: Effect estimates for the “Faces $>$ Shapes” contrast, derived from model 2; for this row, voxels demarcated are those Bonferroni-significant at the $5 %$ level. Fifth row: The fixed effects standard deviation ( $σ$ ), derived from model 2. Sixth row: The standard deviation of the subject-level random intercept ( $σ \sqrt{d}$ where $d$ is the only non-zero element of $D$ in model 2).

See this image and copyright information in PMC

References

1. Allen N., Sudlow C., Downey P., Peakman T., Danesh J., Elliott P., Gallacher J., Green J., Matthews P., Pell J., Sprosen T., Collins R. Uk biobank: current status and what it means for epidemiology. Health Policy Technol. 2012;1(3):123–126. doi: 10.1016/j.hlpt.2012.07.003. - DOI
2. ISSN 2211-8837
1. Baayen R.H., Davidson D.J., Bates D.M. Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 2008;59(4):390–412. doi: 10.1016/j.jml.2007.12.005. - DOI
2. ISSN 0749-596X, URL http://www.sciencedirect.com/science/article/pii/S0749596X07001398. Special Issue: Emerging Data Analysis
1. Barch D.M., Burgess G.C., Harms M.P., Petersen S.E., Schlaggar B.L., Corbetta M., Glasser M.F., Curtiss S., Dixit S., Feldt C., Nolan D., Bryant E., Hartley T., Footer O., Bjork J.M., Poldrack R., Smith S., Johansen-Berg H., Snyder A.Z., Essen D.C.V. Function in the human connectome: task-fmri and individual differences in behavior. NeuroImage 2013-oct vol. 80. 2013;80 doi: 10.1016/j.neuroimage.2013.05.033. - DOI - PMC - PubMed
1. Bates, D., 2006. lmer, p-values and all that. https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html, Accessed: 2020-12-07.
1. Bates D., Mchler M., Bolker B., Walker S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015;67(1):1–48. doi: 10.18637/jss.v067.i01. - DOI
2. ISSN 1548-7660

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BLMM: Parallelised computing for big linear mixed models

Affiliations

BLMM: Parallelised computing for big linear mixed models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials