Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 19;24(2):bbad073.
doi: 10.1093/bib/bbad073.

Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis

Affiliations

Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis

Jingxuan Bao et al. Brief Bioinform. .

Abstract

Motivation: With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way.

Method: Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods.

Results: We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models.

Availability: Code are publicly available at https://github.com/JingxuanBao/SBFA.

Contact: qlong@upenn.edu.

Keywords: Alzheimer’s disease; biological network; multi-omics; structural Bayesian factor analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Structural Bayesian factor analysis framework.  formula image represents the number of multi-modal data sets with each dataset denoted as formula image, formula image. The concatenation of all formula image data sets is denoted by formula image. We assume both continuous and discrete underlying distribution for each dataset dominated by mean parameter formula image, formula image where formula image is number of samples. formula image represents the formula imageth column of factor matrix formula image where formula image. formula image and formula image represent the formula imageth row and formula imageth column of loading matrix formula image and shrinkage parameter for Laplacian prior where formula image and formula image is the total number of features. formula image represents the log-normal distribution. formula image and formula image denote the mean and precision matrix for log-normal prior, respectively.
Figure 2
Figure 2
Workflow for real data analysis. formula image and formula image are genotyping data encoded by the presence of homozygous major allele and the presence of homozygous alternative allele, respectively. formula image and formula image are normalized QTs representing the gene expression and neuroimaging data. Starting from the multi-modal data, different factor analysis methods are used to extract the latent factors. The number of latent dimensions is tuned either through the implemented function from the package or using BIC. After obtaining the latent factor, we use the latent factor as the input predictor to predict the FAQ score. Linear regression with lasso, ridge and elastic net regularizations are used. The raw data (transpose of multi-omics data), covariates and raw data plus covariates are also used for comparison purpose.
Figure 3
Figure 3
Groundtruth setting for factor loading matrix formula image. Numbers labeled on each column represent the latent variable. Numbers labeled on rows represent the number of rows from the first row to the corresponding row. All shaded area represents elements generated by formula image with probability formula image to be multiplied by formula image. All the other area contains only zero elements.
Figure e1
Figure e1
Estimated factor loadings for the simulated dataset. The heatmap shows the factor loadings from iCluster, JIVE, SLIDE and GBFA without graph information (GBFA(NE)), GBFA with graph information (GBFA(E)), SBFA without graph information (SBFA(NE)), SBFA with graph information (SBFA(E)) and the ground truth. The results are derived from one simulated dataset in the low-dimensional scenario (formula image). The optimal latent dimension (formula image) for each model is indicated by the number of circles in the corresponding method track. Numbers beyond the range of (formula image) have the same color as the boundaries (formula image and 1).

Similar articles

Cited by

References

    1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 2020; 14:1177932219899051. - PMC - PubMed
    1. Trambaiolli LR, Lorena AC, Fraga FJ, et al. Improving alzheimer’s disease diagnosis with machine learning techniques. Clin EEG Neurosci 2011; 42(3): 160–5. - PubMed
    1. Kim M, Bao J, Liu K, et al. Structural connectivity enriched functional brain network using simplex regression with graphnet. In: Machine Learning in Medical Imaging. Cham: Springer International Publishing, 2020, 292–302. - PMC - PubMed
    1. Kim M, Bao J, Liu K, et al. A structural enriched functional network: an application to predict brain cognitive performance. Med Image Anal 2021; 71:102026. - PMC - PubMed
    1. Kochunov P, Zavaliangos-Petropulu A, Jahanshad N, et al. A white matter connection of schizophrenia and Alzheimer’s disease. Schizophr Bull 2020; 47(1): 197–206. - PMC - PubMed

Publication types