Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 14;122(2):e2414018122.
doi: 10.1073/pnas.2414018122. Epub 2025 Jan 7.

Exome sequencing identifies genes for socioeconomic status in 350,770 individuals

Affiliations

Exome sequencing identifies genes for socioeconomic status in 350,770 individuals

Xin-Rui Wu et al. Proc Natl Acad Sci U S A. .

Abstract

Socioeconomic status (SES) is a critical factor in determining health outcomes and is influenced by genetic and environmental factors. However, our understanding of the genetic structure of SES remains incomplete. Here, we conducted a large-scale exome study of SES markers (household income, occupational status, educational attainment, and social deprivation) in 350,770 individuals. For rare coding variants, we identified 56 significant associations by gene-based collapsing tests, unveiling 7 additional SES-associated genes (NRN1, CCDC36, RHOB, EP400, NCAM1, TPTEP2-CSNK1E, and LINC02881). Exome-wide single common variant analysis revealed nine lead single-nucleotide polymorphisms (SNPs) associated with household income and 34 lead SNPs associated with EduYears, replicating previous GWAS findings. The gene-environment correlations had a substantial impact on the genetic associations with SES, as indicated by the significantly increased P values in several associations after controlling for geographic regions. Furthermore, we observed the pleiotropic effects of SES-associated genetic factors on a wide range of health outcomes, such as cognitive function, psychosocial status, and diabetes. This study highlights the contribution of coding variants to SES and their associations with health phenotypes.

Keywords: health; pleiotropy; rare coding variant; socioeconomic status; whole-exome sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Summary of this study. The figure summarizes the analytical flow and key findings of this study. First, we performed gene-based collapsing analysis and single-variant analysis for rare and common variants from exomes data, respectively. Then, LOVO analysis, conditional analysis, and a series of biological annotations were performed to gain a more detailed characterization of SES genetic architecture. We finally evaluated the associations between SES genetic factors on a range of human health conditions and explored the potential mechanisms underpinning SES-health gradient by using neuroimaging, blood biochemistry, and other bioindicators. Partial illustrations sourced from BioRender with publishing license (https://www.biorender.com/).
Fig. 2.
Fig. 2.
Rare variants for socioeconomic status. Manhattan plots show the results from gene-based collapsing tests for (A) Townsend deprivation index, (B) household income, (C) occupational status, and (D) educational attainment. Each gene-trait pair was tested 12 times according to two function categories (distinguished by shape) and 4 max-MAF categories (distinguished by size) and performed by the SKAT-O test in SAIGE-GENE+ (20), adjusting for age, biological sex, and top 10 PCs. The group with the smallest P value in each gene-trait pair was retained in the Manhattan plots for a concise visualization. The x-axis represents the chromosomes, and the y-axis represents the −log10(P). The red dotted line indicates the Bonferroni-corrected significant threshold of P < 8.49 × 10−8. (E) The counts of pLOF and missense variants contained in genes significant in the gene-based collapsing tests. Function and max-MAF classes were chosen for cases with the smallest P value. The x-axis represents the numbers of pLOF variants, and the y-axis represents the numbers of likely deleterious missense variants. (F) and (G) show the rare mutations in NRN1 and CCDC36, respectively. The corresponding protein domains were determined by SMART (21). The consequence of each variant was annotated according to the canonical transcript of genes. (H) The distribution of TDI categories for NRN1 and CCDC36 carriers and noncarriers.
Fig. 3.
Fig. 3.
Intersection of rare variations with GWAS signals in ADGRB and KDM5B. (A) The Upper panel is the local illustration of GWAS signals around ADGRB2, and the Lower panel describes the rare mutations in ADGRB2. The index common SNP was rs2050256, previously reported by Lee et al. (9). (B) The results from conditional analysis of ADGRB2. (C) The regional plot shows GWAS signals around KDM5B, with the index common signal rs10920444. (D) The results from conditional analysis of KDM5B. The P values reported in (B) and (D) were calculated from SKAT-O tests, and the effect sizes were estimated through Burden tests.
Fig. 4.
Fig. 4.
Biological annotation of 50 socioeconomic status–associated genes. (A) Results from tissue types enrichment analysis. The x-axis represents 54 GTEx tissue types (red dot for 13 brain regions, blue for 39 nonbrain regions, and purple for 2 cell lines), and the y-axis represents the −log10(P). The black dotted line represents the significant threshold of P < 9.25 × 10−4 (0.05/54). (B) The average expression of SES-associated genes in the human brain with BrainSpan developmental transcriptome data (27). The x-axis represents the developmental period, and the y-axis represents the average brain expression. (C) and (D) show the cell types expressed by SES genes. The darker color of the dot in (D) indicates the higher relative expression level of SES-associated genes in that cell type. (E) Synaptic location and (F) process annotated by SynGO (29). The counts of genes in each synaptic ontology term are indicated by the color darkness. Abbreviations: OPC, oligodendrocyte progenitor cells.
Fig. 5.
Fig. 5.
Associations with health-related traits. (A) Results from gene-based collapsing tests with health conditions for identified rare variants and (B) results from single-variant association tests with health conditions for identified common variants. The x-axis represents the categories of selected conditions, and the y-axis represents the −log10(P). (C) The number of significant associations of each identified variant with health conditions. (D) The heatmap shows the variants with at least one significant association with brain structure indexes, including 68 cortical regions (first 34 rows) and 16 subcortical regions (last 8 rows). A darker shade represents a smaller P value of the association.

References

    1. Zhang Y. B., et al. , Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: Two prospective cohort studies. BMJ 373, n604 (2021). - PMC - PubMed
    1. Antonoplis S., Studying socioeconomic status: Conceptual problems and an alternative path forward. Perspect. Psychol. Sci. 18, 275–292 (2023). - PMC - PubMed
    1. Farah M. J., Socioeconomic status and the brain: Prospects for neuroscience-informed policy. Nat. Rev. Neurosci. 19, 428–438 (2018). - PubMed
    1. Silventoinen K., et al. , Genetic and environmental variation in educational attainment: An individual-based analysis of 28 twin cohorts. Sci. Rep. 10, 12681 (2020). - PMC - PubMed
    1. Hill W. D., et al. , Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income. Nat. Commun. 10, 5741 (2019). - PMC - PubMed

LinkOut - more resources