Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 13:2025.01.10.632500.
doi: 10.1101/2025.01.10.632500.

Patterns of population structure and genetic variation within the Saudi Arabian population

Affiliations

Patterns of population structure and genetic variation within the Saudi Arabian population

D K Malomane et al. bioRxiv. .

Abstract

The Arabian Peninsula is considered the initial site of historic human migration out of Africa. The modern-day indigenous Arabians are believed to be the descendants who remained from the ancient split of the migrants into Eurasia. Here, we investigated how the population history and cultural practices such as endogamy have shaped the genetic variation of the Saudi Arabians. We genotyped 3,352 individuals and identified twelve genetic sub-clusters that corresponded to the geographical distribution of different tribal regions, differentiated by distinct components of ancestry based on comparisons to modern and ancient DNA references. These sub-clusters also showed variation across ranges of the genome covered in runs of homozygosity, as well as differences in population size changes over time. Using 25,488,981 variants found in whole genome sequencing data (WGS) from 302 individuals, we found that the Saudi tend to show proportionally more deleterious alleles than neutral alleles when compared to Africans/African Americans from gnomAD (e.g. a 13% increase of deleterious alleles annotated by AlphaMissense between 0.5 - 5% frequency in Saudi, compared to 7% decrease of the benign alleles; P < 0.001). Saudi sub-clusters with greater inbreeding and lower effective population sizes showed greater enrichment of deleterious alleles as well. Additionally, we found that approximately 10% of the variants discovered in our WGS data are not observed in gnomAD; these variants are also enriched with deleterious annotations. To accelerate studying the population-enriched deleterious alleles and their health consequences in this population, we made available the allele frequency estimates of 25,488,981 variants discovered in our samples. Taken together, our results suggest that Saudi's population history impacts its pattern of genetic variation with potential consequences to the population health. It further highlights the need to sequence diverse and unique populations so to provide a foundation on which to interpret medical- and pharmaco- genomic findings from these populations.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. The genetic structure of Saudi Arabians and its relation to global populations.
(A) A two-dimensional UMAP of Saudi Arabians based on the top 10 principal components. Each individual is colored based on the affiliated tribal region (see (D)). WGS samples did not have self-reported or harmonized tribal affiliation and are assigned their own color. (B) PCA of Saudi Arabian clusters and HGDP populations. Saudi Arabians are grouped in a single group. Inset shows clusters 1 – 10 colored according to the most prevalent tribal region represented in the cluster (see Table S1). Because clusters 11–12 has no single dominating tribal region, they were assigned distinct separate colors. (C) Admixture analysis of Saudi Arabian clusters and HGDP populations for K = 4 (top) and K = 9 (bottom). ME – Middle Eastern, AFR – African, EA – East Asian, CSA – Central & South Asian, EUR – European, OC – Oceania, AMR – American. The names of Saudi clusters and HGDP populations are shown on the bottom X-axis. However, due to limited space some of the labels for smaller populations from HGDP are omitted. Grouped regional labels are shown on the top X-axis of plots. We show the admixture results of the Saudi clusters alone in Figure S3B. (D) A regional map of Saudi Arabia with matching colors to the regional labels in (A) and (B).
Figure 2.
Figure 2.. Ancestry compositions in Saudi Arabians estimated with aDNA data as reference.
(A) Barplots for plausible (p-value ≥ 0.01 and admixture weights between 0 and 1) qpAdm models grouped by age brackets of source populations (top and bottom; Methods). For Pre-Pottery Neolithic – Neolithic sources (top), three clusters were rejected under the Armenia_MasisBlur N + MAR_Taforalt_EpiP qpAdm model at the statistical threshold cut-off: cluster12, 3, and 11. We display under the corresponding qpAdm barplot well-fitting (nrmsd < 0.7 and Z > 2) estimates of admixture timing in years. (B) ‘Basal Eurasian’ ancestry estimated from f4-statistic of form f4(Saudi cluster, Han.DG; Ust-Ishim, African aDNA group) with varying ancient African groups. We plotted three standard errors for each f4-statistic. The Saudi cluster (y axis) order in each plot is retained throughout (c12, c3, c11, c4, c8, c7, c10, c1, c5, c2, c6, and c9) following decreasing value for the statistic f4(Saudi cluster, Han.DG; Ust-Ishim, Ethiopia 4500BP). Significant (absolute Z-score > 3) negative f4-statistic values indicate the Saudi cluster possesses excess shared drift basal to the shared drift between the groups (Han.DG and Ust-Ishim), commonly interpreted as deriving from a population basal to the OOA event (i.e. the Basal Eurasian).
Figure 3.
Figure 3.. Runs of homozygosity in Saudi Arabians.
(A) Average total length and number of ROH per cluster. The numbers next to the symbol represents the mean ME-like ancestry proportion. (B) Total length and number of ROH per individual across the Saudi Arabian cohort. For (A) and (B), symbols are colored by the geographical region associated with each cluster (Figure 1D). (C) Total length of ROH vs ancestry proportion per individual stratified by three length classes of ROHs. ROH – Runs of homozygosity, ME - Middle Eastern, EA – East Asia.
Figure 4.
Figure 4.. Population size trajectories between the Saudi Arabian sub-clusters.
Effective population sizes were computed from genealogical trees using RELATE (see Methods). The number of samples per cluster used for the estimates can be found in Table S1.
Figure 5.
Figure 5.. Distribution of minor allele frequency across functional classes.
(A) Ratio of Saudi to gnomAD-AFR variants. The sample size of gnomAD-AFR is based on downsampling to Saudi sample size, n = 302. (B) Ratio of Saudi cluster groupA to cluster groupB variants. The sample size of cluster groupB is based on downsampling to groupA sample size, n = 124. Variant functional consequences were annotated based on VEP (loss-of-function, missense, or synonymous variants), AlphaMissense (likely pathogenic, likely benign, and ambiguous), and GPN. AC and AF refer to allele count and allele frequency, respectively. AFg5 refers to allele frequency greater than 5%. Top_1p refers to variants with the top 1% of GPN scores (more deleterious) and Bottom_1p refers to variants with the bottom 1% of GPN scores (more neutral). AFR denotes the gnomAD-AFR sample. LOF refers to Loss of function. ** and * denote frequency bins with significant difference between the most deleterious (red) and most neutral (green) through bootstrapping at p < 0.01 and < 0.05, respectively.

Similar articles

References

    1. Armitage S.J., Jasim S.A., Marks A.E., Parker A.G., Usik V.I., and Uerpmann H-P (2011). The Southern Route“Out of Africa”:Evidence for an Early Expansionof Modern Humans into Arabia. Science (1979) 331, 453–456. 10.1594/PANGAEA.755114. - DOI - PubMed
    1. Henn B.M., Cavalli-Sforza L.L., and Feldman M.W. (2012). The great human expansion. Preprint, https://doi.org/10.1073/pnas.1212380109 10.1073/pnas.1212380109. - DOI - PMC - PubMed
    1. Rodriguez-Flores J.L., Fakhro K., Agosto-Perez F., Ramstetter M.D., Arbiza L., Vincent T.L., Robay A., Malek J.A., Suhre K., Chouchane L., et al. (2016). Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations. Genome Res 26, 151–162. 10.1101/gr.191478.115. - DOI - PMC - PubMed
    1. Groucutt H.S., Grün R., Zalmout I.A.S., Drake N.A., Armitage S.J., Candy I., Clark-Wilson R., Louys J., Breeze P.S., Duval M., et al. (2018). Homo sapiens in Arabia by 85,000 years ago. Nat Ecol Evol 2, 800–809. 10.1038/s41559-018-0518-2. - DOI - PMC - PubMed
    1. Fernandes V., Alshamali F., Alves M., Costa M.D., Pereira J.B., Silva N.M., Cherni L., Harich N., Cerny V., Soares P., et al. (2012). The Arabian cradle: Mitochondrial relicts of the first steps along the Southern route out of Africa. Am J Hum Genet 90, 347–355. 10.1016/j.ajhg.2011.12.010. - DOI - PMC - PubMed

Publication types

LinkOut - more resources