Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;48(9):1071-6.
doi: 10.1038/ng.3592. Epub 2016 Jul 18.

Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery

Collaborators, Affiliations

Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery

Eric M Scott et al. Nat Genet. 2016 Sep.

Abstract

The Greater Middle East (GME) has been a central hub of human migration and population admixture. The tradition of consanguinity, variably practiced in the Persian Gulf region, North Africa, and Central Asia, has resulted in an elevated burden of recessive disease. Here we generated a whole-exome GME variome from 1,111 unrelated subjects. We detected substantial diversity and admixture in continental and subregional populations, corresponding to several ancient founder populations with little evidence of bottlenecks. Measured consanguinity rates were an order of magnitude above those in other sampled populations, and the GME population exhibited an increased burden of runs of homozygosity (ROHs) but showed no evidence for reduced burden of deleterious variation due to classically theorized 'genetic purging'. Applying this database to unsolved recessive conditions in the GME population reduced the number of potential disease-causing variants by four- to sevenfold. These results show variegated genetic architecture in GME populations and support future human genetic discoveries in Mendelian and population genetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests

Figures

Figure 1
Figure 1. Greater Middle East Variome as a hub of human genetics
a. Map of GME sub-regions. Lines define borders for admixture analysis from East Asia, Europe, Sub-Saharan Africa and the novel GME contribution (NWA: Northwest Africa, NEA: Northeast Africa, TP: Turkish Peninsula, SD: Syrian Desert, AP: Arabian Peninsula, PP: Persia and Pakistan). Pie charts: admixture proportions of 1000 Genomes Project (1000G) continental populations according to K=6 clusters. b. Global ancestry proportions (K=6) for 1000G control populations with three distinct sources of contribution. 1000G population contributions: Africa (red), Europe (green) and East Asia (blue). GME populations from west to east: NWA (purple), AP (orange), and PP (yellow) derived from the GME. c. TreeMix phylogeny of GME along with 1000G controls representing population divergence patterns. Length of the branch proportional to population drift. GME populations grouped around the African branch, but showed a substantial divergence. YRI: Yoruba in Ibadan, LWK: Luhya in Webuye Kenya, FIN: Finnish, GBR: Great Britain, TSI: Toscani, CHS: Southern Han Chinese, CHB: Han Chinese in Beijing, JPT: Japanese in Tokyo. d. Wright’s Fixation Index (Fst) values for all pairs of GME and 1000G European populations, showing a smaller distance between GME and European populations compared with Sub-Saharan African populations. Greatest Fst value between any two GME populations was 0.026 (i.e. a quarter of the distance between FIN and JPT).
Figure 2
Figure 2. Wide diversity and high inbreeding coefficients in GME substructure
a. Principal component analysis (PCA) for individuals from GME and 1000G populations. Individuals projected along PC3 and PC4 axes. Persia and Pakistan (PP), Northwest Africa (NWA) and Europe defined the limits from right, left, and top, as coinciding with geography. Arab Peninsula (AP) defined the bottom limit, and was closest to Northeast Africa (NEA) and Syrian Desert (SD). b. GME populations had increased rates of linkage disequilibrium decay compared to 1000G European and East Asian populations. Mean variant correlations (r2) shown for each 1,000 basepair (bp) bin from 1,000–70,000 bp. c. Inbreeding coefficient (F) distributions for GME and 1000G populations. GME populations (purple) showed elevated F values, consistent with increased rates of consanguineous marriages. Box plots show median (horizontal line), 25%ile (45° angle), 75%ile (90° angle), minimum and maximum observations (whiskers). d. F distributions for family structures for GME and European American (EA) trios. Mean F values correlated with expected for consanguineous offspring. Unk=unknown.
Figure 3
Figure 3. Distributions of short and long Runs of Homozygosity (ROH) correlates with patterns of bottlenecks and recent consanguinity
a. Sample burdens of ROH grouped by length (Short: <0.155 Mb, Medium: 0.156–1.606 Mb, Long: >1.607 Mb). GME samples (purple) showed a unique contribution of long ROH compared with other populations (*), with less in short and medium bins compared to Europe and East Asia. Total ROH in GME sub-regions overlapped with European and East Asian likely due to greater bottlenecks in these populations. b. Histograms of long ROH for GME, Africa, Europe, and East Asia. GME samples more frequently harbored runs >4 Mb compared to other populations. ROH >15 Mb are binned together (* peak unique to Middle East). c. Longer GME ROH spans were enriched for rare variation, while shorter runs were enriched for more common variation. Proportion of variants binned by allele frequency for different sized ROH, binned by 0.5 Mb intervals. Probability density function calculated for each allele frequency class. Note that AFs for common alleles declined whereas AFs for rare and very rare alleles rose as ROH increased in size (Common: AF > .05, Rare: AF 0.05–0.01, Very Rare: AF < 0.01).
Figure 4
Figure 4. GME Variome facilitates the discovery of Mendelian disease genes
a–b. Comparison of rare derived allele frequencies (DAF) between GME and Exome Sequencing Project (ESP). AA: African American, EA: European-American. Hexagonal bins shaded by log number of variants within each bin. Pearson’s r suggests GME DAFs were not accurately estimated by AA or EA populations. b. The majority of variants in the rarest DAF bins were unique to the GME. AA: found only in GME and AA. EA: found only in GME and EA. All: found in GME, EA and AA. GME Unique: found only in GME. c. Change in per-individual burden of eight variant classes as a function of increasing the number of individuals incorporated into the GME Variome cohort. As sample size increased there was a drop in the number of unique variants, along with more accurate estimation of DAFs for rare variants. Bootstraps were sampled with replacement for 100 iterations to calculate standard errors. “High impact”: variants meeting predicted deleteriousness thresholds (see Methods). d. Number of candidate variants for 20 families, meeting segregation and deleteriousness filtering criteria, using DAFs derived from Hereditary Spastic Paraplegia (HSP)-only families (top) or also incorporating the GME Variome (bottom). Single, Duo, Trio: families with one, two or three affected members. Colors: number of individuals sharing the variant. “0”: no other individuals carried the allele, etc. Analysis was performed using this threshold for the number of individuals sharing alleles (0,1,2,3). Note drop in number of segregating variants for any given family after the GME Variome was applied.

Comment in

References

    1. Anwar WA, Khyatti M, Hemminki K. Consanguinity and genetic diseases in North Africa and immigrants to Europe. Eur J Public Health. 2014;24(Suppl 1):57–63. - PubMed
    1. Al-Gazali L, Hamamy H, Al-Arrayad S. Genetic disorders in the Arab world. British Med J. 2006;333:831–4. - PMC - PubMed
    1. Hussain R, Bittles AH. The prevalence and demographic characteristics of consanguineous marriages in Pakistan. J Biosoc Sci. 1998;30:261–75. - PubMed
    1. Sheffield VC, Stone EM, Carmi R. Use of isolated inbred human populations for identification of disease genes. Trends Genet. 1998;14:391–6. - PubMed
    1. Sharp JM. The Broader Middle East and North Africa Initiative: An overview. CRS Report for Congress; 2005.

Substances