. 2022 Aug 10;38(16):4011-4018.

doi: 10.1093/bioinformatics/btac431.

Outlier detection for multi-network data

Pritam Dey¹, Zhengwu Zhang², David B Dunson¹

Affiliations

¹ Department of Statistical Science, Duke University, Durham, NC 27708, USA.
² Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

PMID: 35762974
PMCID: PMC9890313
DOI: 10.1093/bioinformatics/btac431

Outlier detection for multi-network data

Pritam Dey et al. Bioinformatics. 2022.

. 2022 Aug 10;38(16):4011-4018.

doi: 10.1093/bioinformatics/btac431.

Authors

Pritam Dey¹, Zhengwu Zhang², David B Dunson¹

Affiliations

¹ Department of Statistical Science, Duke University, Durham, NC 27708, USA.
² Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

PMID: 35762974
PMCID: PMC9890313
DOI: 10.1093/bioinformatics/btac431

Abstract

Motivation: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank.

Results: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications.

Availability and implementation: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Brain fiber streamlines from diffusion MR imaging (top row) with corresponding binary adjacency matrices (bottom row) of some subjects from the UK Biobank dataset. The streamlines are visualized using TrackVis (Wang and Wedeen, 2007) and colored by orientation (i.e. left to right: red, anterior to posterior: green, superior to inferior: blue). In these tractography diagrams, the anterior side of the brain is facing inwards into the page. The matrices in the bottom row are the ones used by ODIN. These adjacency matrices are not directly available from UK Biobank. We preprocessed the raw data using the PSC pipeline (Zhang *et al.*, 2018) to extract these adjacency matrices. In these matrices, black indicates presence of at least one fiber connecting the two corresponding regions and white represents absence of such fibers. The brain network represented by the streamlines and adjacency matrix in (a) is a typical non-outlier. The networks shown in (b)–(e) are outliers of various kinds selected from among the outliers detected by ODIN (A color version of this figure appears in the online version of this article.)

**Fig. 2.**
Run-time (in seconds) for (a) each iteration of the estimation algorithm with respect to sample size, N; (b) calculation of the influence measures with respect to sample size, N; (c) each iteration of the estimation algorithm with respect to number of edges, $L = V (V - 1) / 2$ and (d) calculation of the influence measures with respect to the number of edges L. The first three are linear and the last one is quadratic

**Fig. 3.**
Box-plots of $I M_{1} (i)$ of outliers and non-outliers for data generated from model (1) with ‘outliers’ simulated by flipping a fixed proportion of edges for some of the networks. The three figures are for 1%, 5% and 10% of flipped edges, respectively. This demonstrates that as outliers become more severe, ODIN can more easily distinguish outliers and non-outliers

**Fig. 4.**
An exploratory view of the difference between outliers and non-outliers detected by ODIN. In each of the figures, the blue graphs represent outliers and the orange graph represents non-outliers. (a) Distribution of number of edges connecting two ROIs located in different hemispheres. (b) Distribution of number of edges connecting two ROIs located in the same hemisphere. In both figures, it is clear that these distributions are significantly different between outliers and non-outliers (A color version of this figure appears in the online version of this article.)

**Fig. 5.**
Changes in brain connectivity with increasing numeric memory scores with (left) and without (right) outliers. The 20 edges having the most change in connectivity are shown to improve visualization. The dots on the circle boundary represent ROIs and are colour coded to represent which lobe and hemisphere it belongs to. For a more complete description of the ROI labels see Supplementary Section S3

**Fig. 6.**
Changes in brain connectivity with increasing symbol digit substitution scores with (left) and without (right) outliers. The 20 edges having the most change in connectivity are shown to improve visualization. The dots on the circle boundary represent ROIs and are colour coded to represent which lobe and hemisphere it belongs to. For a more complete description of the ROI labels see Supplementary Section S3

**Fig. 7.**
Changes in brain connectivity with increasing fluid intelligence scores with (left) and without (right) outliers. The 20 edges having the most change in connectivity are shown to improve visualization. The dots on the circle boundary represent ROIs and are colour coded to represent which lobe and hemisphere it belongs to. For a more complete description of the ROI labels see Supplementary Section S3

See this image and copyright information in PMC

References

1. Alfaro-Almagro F. et al. (2018) Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage, 166, 400–424. - PMC - PubMed
1. Aliverti E., Durante D. (2019) Spatial modeling of brain connectivity data via latent distance models with nodes clustering. Stat. Anal. Data Min. ASA Data Sci. J., 12, 185–196.
1. Baum G.L. et al. (2018) The impact of in-scanner head motion on structural connectivity derived from diffusion MRI. NeuroImage, 173, 275–286. - PMC - PubMed
1. Casey B. et al. ; ABCD Imaging Acquisition Workgroup. (2018) The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci., 32, 43–54. - PMC - PubMed
1. Desikan R.S. et al. (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31, 968–980. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Outlier detection for multi-network data

Affiliations

Outlier detection for multi-network data

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources