Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 26:2024.04.22.590547.
doi: 10.1101/2024.04.22.590547.

Large-scale Deep Proteomic Analysis in Alzheimer's Disease Brain Regions Across Race and Ethnicity

Affiliations

Large-scale Deep Proteomic Analysis in Alzheimer's Disease Brain Regions Across Race and Ethnicity

Fatemeh Seifar et al. bioRxiv. .

Update in

  • Large-scale deep proteomic analysis in Alzheimer's disease brain regions across race and ethnicity.
    Seifar F, Fox EJ, Shantaraman A, Liu Y, Dammer EB, Modeste E, Duong DM, Yin L, Trautwig AN, Guo Q, Xu K, Ping L, Reddy JS, Allen M, Quicksall Z, Heath L, Scanlan J, Wang E, Wang M, Linden AV, Poehlman W, Chen X, Baheti S, Ho C, Nguyen T, Yepez G, Mitchell AO, Oatman SR, Wang X, Carrasquillo MM, Runnels A, Beach T, Serrano GE, Dickson DW, Lee EB, Golde TE, Prokop S, Barnes LL, Zhang B, Haroutunian V, Gearing M, Lah JJ, De Jager P, Bennett DA, Greenwood A, Ertekin-Taner N, Levey AI, Wingo A, Wingo T, Seyfried NT. Seifar F, et al. Alzheimers Dement. 2024 Dec;20(12):8878-8897. doi: 10.1002/alz.14360. Epub 2024 Nov 13. Alzheimers Dement. 2024. PMID: 39535480 Free PMC article.

Abstract

Introduction: Alzheimer's disease (AD) is the most prevalent neurodegenerative disease, yet our comprehension predominantly relies on studies within the non-Hispanic White (NHW) population. Here we aimed to provide comprehensive insights into the proteomic landscape of AD across diverse racial and ethnic groups.

Methods: Dorsolateral prefrontal cortex (DLPFC) and superior temporal gyrus (STG) brain tissues were donated from multiple centers (Mayo Clinic, Emory University, Rush University, Mt. Sinai School of Medicine) and were harmonized through neuropathological evaluation, specifically adhering to the Braak staging and CERAD criteria. Among 1105 DLPFC tissue samples (998 unique individuals), 333 were from African American donors, 223 from Latino Americans, 529 from NHW donors, and the rest were from a mixed or unknown racial background. Among 280 STG tissue samples (244 unique individuals), 86 were African American, 76 Latino American, 116 NHW and the rest were mixed or unknown ethnicity. All tissues were uniformly homogenized and analyzed by tandem mass tag mass spectrometry (TMT-MS).

Results: As a Quality control (QC) measure, proteins with more than 50% missing values were removed and iterative principal component analysis was conducted to remove outliers within brain regions. After QC, 9,180 and 9,734 proteins remained in the DLPC and STG proteome, respectively, of which approximately 9,000 proteins were shared between regions. Protein levels of microtubule-associated protein tau (MAPT) and amyloid-precursor protein (APP) demonstrated AD-related elevations in DLPFC tissues with a strong association with CERAD and Braak across racial groups. APOE4 protein levels in brain were highly concordant with APOE genotype of the individuals.

Discussion: This comprehensive region resolved large-scale proteomic dataset provides a resource for the understanding of ethnoracial-specific protein differences in AD brain.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.. A. Schematic illustrating the cohort characteristics and the experimental workflow for mass spectrometry (MS) of the human brain proteome across frontal and temporal brain tissue samples.
This study incorporated a total of 1105 dorsolateral prefrontal cortex (DLPFC) brain tissues from 998 individuals, categorized as follows: 529 non-Hispanic white (NHW), 333 African American, 223 Latino American, and others (n= 20) as applicable. These samples were sourced from four prominent data distribution sites: Emory University, Mayo Clinic, Rush University, and Mount Sinai University Hospital. Additionally, 280 STG tissues from a subset of 244 individuals were included, with 116 NHW, 86 African American, 78 Hispanic, and others as applicable. STG samples were obtained from a racially diverse set of specimens originating from Mayo Clinic and Emory, distributed across 19 batches. Tissues underwent an experimental pipeline involving protein digestion, batch randomization, TMT labeling, fractionation, and subsequent mass spectrometric measurements. A total of 72 DLPFC batches were processed, comprising 9 batches from Emory, 24 from Mayo Clinic, 14 from Mount Sinai, and 25 from Rush (comprising a total of 72 batches). The randomization of batches was conducted to ensure a representative and diverse dataset. The output included a total of 6479 raw files for DLPFC samples and 1824 raw files for STG. B. Venn diagram of total number of proteins quantified from DLPFC and STG samples. A total of 11748 protein groups were identified from DLPFC and 11003 from STG samples, with 10738 shared protein groups. C. Venn diagram of total protein from DLPFC and STG samples after quality control (QC) across all samples. 9180 protein groups were identified from DLPFC samples and 9734 from STG, with 9015 shared protein groups.
Fig 2.
Fig 2.. Quality Control (QC) and Batch Correction for DLPFC Tissue proteins.
A. The QC workflow is illustrated in the flowchart in 3 main steps: Step 1. Pre-processing for missing values: Only proteins with missing data in less than 50% of the samples were retained. The ratio of protein abundance to the total protein abundance for each sample was calculated to adjust for sample loading differences resulting in 9180 proteins being retained across 1105 samples. Subsequently, the data was log2 transformation Step 2. Outlier detection and removal: Iterative principal component analysis (PCA) was employed to identify and eliminate sample outliers. After multiple rounds of PCA analysis, 19 outliers were identified and removed, leaving 9180 proteins across 1086 samples. Step 3. Batch effect regression: Variance attributable to batching was mitigated through regression of the 9180 proteins in 1086 samples. B and C. Multidimensional scaling (MDS) plot showing variation among samples (B) before correcting for batch and (C) after regressing for batch effect. The plot dimensions (dim 1 and 2) reveal distinctive clusters formed by samples by site (Emory (red), Mount Sinai (blue), Rush (purple), and Mayo (green)), with some scattering observed among samples before regressing for batch effect (B). (C) The plot illustrates the successful removal of variance due to batch. After correcting for batch effects, samples from all four sites - Emory (red), Mount Sinai (blue), Rush (purple), and Mayo (green) - cluster together, indicating a more cohesive grouping (n.b the change in scale from B to C). The correction mitigates the dispersion observed in panel B, highlighting the effectiveness of the batch correction procedure in harmonizing the sample distribution across different data distribution sites. D and E. Variance partition analysis using experimental factors to evaluate the percentage of explained variance in proteomic samples. Violin plots before (D) and after (E) batch correction illustrate the distribution of explained variances in overall proteomic values. The Y-axis represents the percentage of explained variance, while the X-axis depicts factors contributing to variance, such as age, sex, race, diagnosis, residuals, and batch. Notably, batch variance is present before batch correction, influencing the overall proteomic profile. Panel E displays the same factors on the X-axis after batch correction. Significantly, the violin plot demonstrates a substantial reduction in variance associated with batch, ultimately reaching near zero percent after batch regression. Moreover, even after batch correction, factors such as age, sex, race, AD diagnosis, and other individual traits (residual) had levels of impact on protein abundance patterns. Each point on the violin plot represents a specific protein, with the corresponding name next to it. This underscores the efficacy of the correction procedure in eliminating batch-related variability from the proteomic data.
Fig 3.
Fig 3.. Quality Control (QC) and Batch Correction for STG Tissue proteins.
A. The analysis workflow for data QC is depicted in three main steps: Step 1. Handling missing values: Proteins with missing data in more than 50% of the samples were removed, adjusting for sample loading differences through ratio calculation and log2 transformation. This yielded 9,734 proteins across 280 samples. Step 2. Identification and removal of outliers: Iterative principal component analysis (PCA) was utilized to detect and eliminate sample outliers. Following three rounds of PCA, two outliers were removed, resulting in 9,734 proteins across 278 samples. Step 3. Batch effect removal: Regression was applied to mitigate batch effects for the 9,734 proteins in 278 samples. B and C. Analysis of Multidimensional Scaling (MDS) plots: MDS plots depict sample variation (B) before batch correction and (C) after regression for batch effect. Emory (red) and Mayo (green) samples form distinctive clusters, with some scattering observed among samples before batch regression (B). (C) demonstrates the impact of batch regression, revealing a more cohesive grouping of Emory (red) and Mayo (green) samples. The correction effectively reduces the dispersion observed in panel B. D and E. Variance partition analysis for proteomic samples: Violin plots (D) before and (E) after batch correction show the distribution of explained variances in overall proteomic values. Panel D’s Y-axis represents the percentage of explained variance, while the X-axis includes factors like age, sex, race, diagnosis, residuals, and batch. Similar to Fig 2.D, batch variance revealed a high impact on the proteomic profile before correction. Panel E displays the same factors after batch correction, demonstrating a substantial reduction in variance associated with batch. In addition, after batch correction, age, sex, race, AD diagnosis, and other individual characteristics (residuals) remain influential factors shaping protein abundance patterns. Each data point represents a unique protein, with the corresponding protein names provided adjacent to the top points. This highlights the success of the regression analysis in eliminating batch-related variability from the proteomic data.
Fig 4.
Fig 4.. Variance Explained by individual Characteristics in DLPFC Tissues.
The bar plots (A, C, E) depict the amount of variance explained by sex, race, and Alzheimer’s disease (AD) diagnosis across all DLPFC samples. A. Top-ranking proteins associated with sex in the dataset were identified through variance partitioning and depicted as bar plots. Boxplots in panel B illustrate the log2 normal abundance levels of four selected proteins exhibiting significant differences between males and females. These proteins serve as key indicators of sex-related variations and are depicted with statistical significance (p <0.05). C. Bar plots of top-ranking proteins associated with race differences in the DLPFC dataset. Boxplots in panel D illustrates the log2 normal abundance levels of four selected proteins demonstrating significant differences between African American individuals and other races (p <0.05). E. Bar plots identified top-ranking proteins contributing to the differences in the diagnosis of AD within the dataset. Boxplots in panel F display the log2 normal abundance levels of four selected proteins exhibiting significant differences between AD patients and controls, as well as other diagnostic categories (p <0.05).
Fig 5.
Fig 5.. Variance Explained by individual Characteristics in STG Tissues.
The bar plots in panels A, C, and E illustrate the partitioning of total variance for each protein into fractions attributable to different dimensions of variation in the STG samples. A. Top-ranking proteins contributing to sex differences in the dataset were identified through variance partitioning and are presented as bar plots, showing the proportion of variance attributable to sex. Boxplots in panel B demonstrate the log2 normal abundance levels of four selected proteins exhibiting significant differences between males and females (p < 0.05). C. Bar plots display the top proteins with fraction of total variance attributed to race differences in the STG dataset. Boxplots in panel D illustrate the log2 normal abundance levels of four selected proteins demonstrating significant differences between African American individuals and individuals of other races (p < 0.05). E. Bar plots identify the top AD-associated proteins with fraction of total variance attributed to AD diagnosis within the STG dataset. Boxplots in Panel F display the log2 normal abundance levels of four selected proteins exhibiting significant differences between AD samples and controls, as well as other diagnostic categories (p < 0.05).
Fig 6.
Fig 6.. Correlation between proteomic Tau and APP measurements with Braak and CERAD pathological scoring.
A. Box plots depicting the relative abundance of APP across AD (pink) and control (green) in DLPFC tissue samples (adjusted ANOVA p-value < 0.05). B. Raincloud plots depict group differences in the relative abundance of Amyloid Precursor Protein (APP) (Y-axis) across distinct CERAD stages (X-axis) in DLPFC tissues. The analysis revealed a stepwise increase in the median APP levels with ascending CERAD classifications, indicating a progressive trend in APP abundance corresponding to different CERAD groups (score 1: green, score 2: orange, score 3: purple, score 4: pink). C. Box plots depicting the relative abundance of MAPT across AD (brown) and control (yellow) in DLPFC tissue samples (adjusted ANOVA p-value < 0.05). D. Raincloud plots illustrate the group differences in the relative abundance of Microtubule-associated protein tau (MAPT) (Y-axis) across distinct Braak stages (X-axis) in DLPFC tissues. The Braak stages range from 0 to 6, with corresponding colors representing different stages (0: dark green, 1: orange, 2: purple, 3: pink, 4: light green, 5: yellow, 6: brown). Notably, the analysis highlights elevated MAPT levels at Braak stages 5 and 6, aligning with the expected increase in tau tangles in later stages of Braak in the frontal cortex.
Fig 7.
Fig 7.. The association between APOE4 genotype and prototype across DLPFC and STG samples.
A. The boxplots of log2 normal abundance of APOE4 protein measured by TMT-MS across each APOE genotype reveal a high APOE4 abundance among APOE ε4 carriers among 920 unique DLPFC tissue samples. B. Histogram of APOE4 log2 normal abundance among DLPFC samples (Y-axis) across ε4 allele presence (red) and non-presence (blue) (X-axis). C. The boxplots of log2 normal abundance of APOE4 protein measured by TMT-MS across 244 STG unique tissue samples reveal a high APOE4 abundance among APOE ε4 carriers. D. Histogram of APOE4 log2 normal abundance among STG samples (Y-axis) across ε4 allele presence (red) and non-presence (blue) (X-axis). high levels of APOE4 abundance were observed in cases with the ε4 allele combination in both cortices, a few discrepancies between APOE4 genotyping and prototyping (purple) were depicted. These inconsistencies may be attributed to various factors, including mis-genotyping or potential technical challenges in mass spectrometry measurements, such as isotope impurity and low signal-to-noise ratio in specific samples.

References

    1. Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the Global Burden of Disease Study 2019. The Lancet Public health. 2022;7(2):e105–e25. - PMC - PubMed
    1. 2023 Alzheimer’s disease facts and figures. Alzheimer’s & dementia : the journal of the Alzheimer’s Association. 2023;19(4):1598–695. - PubMed
    1. 2010 Alzheimer’s disease facts and figures. Alzheimer’s & dementia : the journal of the Alzheimer’s Association. 2010;6(2):158–94. - PubMed
    1. Chin AL, Negash S, Hamilton R. Diversity and disparity in dementia: the impact of ethnoracial differences in Alzheimer disease. Alzheimer disease and associated disorders. 2011;25(3):187–95. - PMC - PubMed
    1. Le Guen Y, Raulin A-C, Logue MW, Sherva R, Belloy ME, Eger SJ, et al. Association of African Ancestry–Specific APOE Missense Variant R145C With Risk of Alzheimer Disease. JAMA. 2023;329(7):551–60. - PMC - PubMed

Publication types