Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;639(8054):404-410.
doi: 10.1038/s41586-024-08516-4. Epub 2025 Feb 12.

Genetic architecture in Greenland is shaped by demography, structure and selection

Affiliations

Genetic architecture in Greenland is shaped by demography, structure and selection

Frederik Filip Stæger et al. Nature. 2025 Mar.

Abstract

Greenlandic Inuit and other indigenous populations are underrepresented in genetic research1,2, leading to inequity in healthcare opportunities. To address this, we performed analyses of sequenced or imputed genomes of 5,996 Greenlanders with extensive phenotypes. We quantified their historical population bottleneck and how it has shaped their genetic architecture to have fewer, but more common, variable sites. Consequently, we find twice as many high-impact genome-wide associations to metabolic traits in Greenland compared with Europe. We infer that the high-impact variants arose after the population split from Native Americans and thus are Arctic-specific, and show that some of them are common due to not only genetic drift but also selection. We also find that European-derived polygenic scores for metabolic traits are only half as accurate in Greenlanders as in Europeans, and that adding Arctic-specific variants improves the overall accuracy to the same level as in Europeans. Similarly, lack of representation in public genetic databases makes genetic clinical screening harder in Greenlandic Inuit, but inclusion of Greenlandic data remedies this by reducing the number of non-causal candidate variants by sixfold. Finally, we identify pronounced genetic fine structure that explains differences in prevalence of monogenic diseases in Greenland and, together with recent changes in mobility, leads to a predicted future reduction in risk for certain recessive diseases. These results illustrate how including data from Greenlanders can greatly reduce inequity in genomic-based healthcare.

PubMed Disclaimer

Conflict of interest statement

Competing interests: M.E.J., I.M. and T.H. hold shares in Novo Nordisk. N.G., K.H. and M.S.R. are now employed at Novo Nordisk. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genetic architecture comparisons.
a, Sample locations in Greenland, inferred admixture proportions and historic effective population size estimated from an ARG of the Greenlandic (masked, n = 150) Inuit using Relate. b, Number of variants (both SNPs and insertion/deletions) as a function of allele frequency (AF), for example, at AF = 10%, we show the number of variants with AF ≥ 10%. The variant counts are grouped by whether the variant is new, found only in dbSNP, in gnomAD with AF < 0.1% or in gnomAD (v.3.0) with AF ≥ 0.1%. Note the logarithmic axis. c, Number of SNPs not found in any African, European or Asian populations in gnomAD for both 448 1KG people from the Americas and 448 Greenlanders. Note the different y axis scale on the two barplots. d, Number of segregating SNPs as a function of how many genomes from a given population were sequenced. e, Proportion of polymorphic SNPs as a function of minimum MAF, for example, at MAF = 5%, we show the proportion of polymorphic SNPs with MAF ≥ 5%. f, Average participants cumulative sum of derived alleles including fixed derived. All lines are extended slightly beyond 100% DAF to show that they all end up at a similar level. g, Proportion (95% CI) of constrained genes where the most common predicted deleterious SNP contributes less than 50% of the gene burden (gene burden informative, Greenland (unadmixed) = 1.4% (1.2–1.7%); British (+CEU) = 11.7% (11.1–12.4%) and Han Chinese = 12.7% (12.0–13.3%)) and where the gene burden is dominated by a single common variant (one variant dominates, Greenland (unadmixed) = 20.0% (19.2–20.8%), British (+CEU) = 13.8% (13.1–14.5%) and Han Chinese = 12.5% (11.9–13.2%)) as illustrated by schematics above the corresponding barplots.
Fig. 2
Fig. 2. Consequences of genetic architecture for disease mapping.
a, Mean number of non-causal pLoF variants (± s.e.m.) remaining after removing variants present at MAF > 0.1% in any population in the reference panel (gnomAD v.3.0.0). b, Same as a, but using different MAF thresholds in the reference panel. c, Mean number of tag-SNPs (R2 > 0.8) as a function of distance from focal SNP. d, Number of imputed variants with an INFO score greater than 0.8 and MAF above threshold given on the x axis. Imputation was performed with either the merged reference panel of Greenlandic WGS plus 1KG (n = 448 + 3,202) or only the 1KG reference panel (n = 3,202). e,f, Comparison of largest GWAS in Greenland and Europe on 13 metabolic traits. e, Genome-wide associations explaining more than 1% variance in the largest GWAS across 13 metabolic traits in both Europe and Greenland (95% CI). Gene names are given below bars, with phenotype associated with the variant listed below gene name. For diabetes, we used liability-scale variance explained. Asterisk, the causal gene in this region is uncertain. Chol, total cholesterol; Gluc2h, glucose (2 h); GlucR, glucose (random); HbA1c, haemoglobin 1Ac; HDL, high-density lipoprotein; Trig, triglycerides. f, Mean incr. R2 (±s.e.m.) of European-derived PGS predicting the corresponding 13 metabolic traits normalized to UK Biobank for all Greenland participants, only unadmixed Greenlandic participants or Danish participants. The two bars to the right are the mean incr. R2 after adding the Arctic-specific variants. g, Variance explained for lead SNP in genome-wide significant associations on 175 plasma proteins (Olink) in Greenland and UK Biobank ordered by variance explained and grouped by the model yielding the lowest P value; n = 3,707. The inset shows a zoom-in of the first 20 GWAS hits.
Fig. 3
Fig. 3. Genetic fine structure.
a, Unsupervised genetic clustering grouped by region. Mean estimated ancestry (K = 8) proportions for all samples in each region are shown in barplots. Map of Greenland with regions coloured (exaggerated inland for visibility) and sample locations indicated with black dots. b, People with birth and sample location in the same region (n = 1,921) visualized on the first two principal components coloured either by birth town region or inferred ancestry proportions. The enlarged areas highlight the ancestry assignment for people between clusters. c, Number of parent–offspring and full sibling relations inferred from genetic data per 100,000 possible relationships between each pair of regions. Grey lines represent sibling relationships, with line width indicating the inferred number of sibling relationships per 100,000 possible relationship pairs. Coloured lines represent parent–offspring relationships, where the colour indicates from which region the parent was sampled and the line width indicates the inferred number of sibling relationships per 100,000 possible relationship pairs. d, Regional differences in AFs for five highly penetrant Arctic-specific recessive variants (coefficient of variation (AF2), SI = 74%, ADCY3 = 251%, PCCB = 164%, TBC1D4 = 42% and ATP8B1 = 149%). e, Expected frequency of homozygous participants for each variant with and without the current fine structure. f, Estimated variant age along with 95% CI of the eight Arctic-specific variants and the variant in FADS2. Coloured percentages are Indigenous allele frequencies of the different regions. Vertical dashed lines are split times between the populations. Map shows a schematic illustration of the migration routes for the ancestral population that gave rise to the Greenlandic Inuit. P values for directional selection are unadjusted and bold P values indicate significance after FDR(BH), see also Supplementary Table 9.
Extended Data Fig. 1
Extended Data Fig. 1. Additional SNPs per sequenced individual.
Number of new SNPs added per additionally sequenced individual for Greenlanders compared to Nigerian, British(+CEU) and Han Chinese samples from 1KG.
Extended Data Fig. 2
Extended Data Fig. 2. Allele frequency distributions.
a, Proportion of polymorphic SNPs for all 1KG populations with at least 85 individuals projected to 85 individuals. The proportion of SNPs is shown as a function of the allele frequency, i.e. at MAF = 5%, the proportion of polymorphic SNPs with MAF ≥ 5% is shown. a, All 1KG populations, b-f, Populations in the African, South Asian, East Asian, European, or Admixed American 1KG superpopulation, respectively.
Extended Data Fig. 3
Extended Data Fig. 3. Allele frequency distributions in predicted functional groups.
Proportion of polymorphic SNPs grouped by predicted functional categories for the a-e, Nigerian, Han Chinese, British(+CEU), Greenlandic, and masked Greenlandic, respectively. pLoF (HC) is the predicted loss of function SNPs LOFTEE high confidence.
Extended Data Fig. 4
Extended Data Fig. 4. Predicted Gene burden distribution in constrained genes.
The gene burden frequency, Burdenfreq, is the proportion of individuals carrying one or more predicted deleterious variants in a gene and ‘common’ is the proportion of the gene burden which is attributed to the most common predicted deleterious variant in the gene. The predicted deleterious variants were here defined as being a missense or LoF variant with an allele frequency lower than 0.01% in African individuals (1KG populations: LWK, ESN, YRI, MSL, and GWD). Proportion (95%CI) of constrained genes. a, The two first panels are the same as Fig. 1g, the third panel shows the proportion of constrained genes where the most common predicted deleterious variant contributes between 50% to 90% of the gene burden, and the fourth panel shows the proportion of genes with a gene burden frequency ≤ 1%. b, similar to a, but changing the common-threshold to 80% instead of 90%.
Extended Data Fig. 5
Extended Data Fig. 5. Number of non-causal variants in clinical screening setting.
a, Mean number of non-causal pLoF variants remaining after removing variants with given frequency in gnomAD. The dashed blue line is the Greenlandic population after removing variants with above given frequency in either gnomAD or in the Greenlandic reference. b, same as panel a, but only for the Greenlandic population at two different fixed gnomAD-MAF thresholds and varying MAF threshold (x-axis) in the Greenlandic reference.
Extended Data Fig. 6
Extended Data Fig. 6. Polygenic score predictive performance.
Incremental R2 of polygenic score (PGS) prediction of phenotypes for UK biobank (non-British Europeans), Denmark (Inter99), Greenland and Greenland unadmixed. For most phenotypes, one or more Arctic-specific variants could be added to the PGS and improved the prediction. List of Arctic-specific variants and their effect in Supplementary table 4. To test whether the incremental R2 is improved with the added Arctic-specific variants, we performed a two-sided paired t-test on the incremental R2 value in Greenland with and without the Arctic-specific variants in the PGS. The improvement was significant both when the test was both done on all traits (p-value = 0.04067) and only on the traits with Arctic-specific variants (p-value = 0.03647). *Traits were rank-based inverse normal transformed separately for each sex and sex was also included as a covariate in the model.
Extended Data Fig. 7
Extended Data Fig. 7. Inferred admixture proportion of non-European ancestry.
a, Estimated individual admixture proportions from HaploNet Admix on masked haplotypes excluding any European ancestry. b, Mean admixture proportions of all samples in sample location.
Extended Data Fig. 8
Extended Data Fig. 8. HaploNet principal components of non-European ancestry.
Principal component analysis from HaploNet PCA. a, All samples visualised with individual ancestry proportions as pie charts. b, samples with <50% European and matching birth town and sample location, coloured by birth town region. c, same samples as in a but only for individuals with birth town information and where location and birth town was in the same region.
Extended Data Fig. 9
Extended Data Fig. 9. TBC1D4 haplotypes in Greenland and 1KG-JPT.
Haplotype plot of the genomic region around the variant in TBC1D4 for Greenlandic heterozygous carriers (n = 116) and the Japanese individual (JPT) from 1KG carrying the variant. For the Greenlandic individuals, their haplotypes without the TBC1D4 variant are shown on top and their haplotypes with the variant are shown on the bottom. Grey means reference allele, blue means alternative allele, and orange/red highlight the TBC1D4 variant. Since the variant is tri-allelic, the Japanese individual did not have phasing information for that variant. Notice that the japanese individual matches haplotypes without the TBC1D4 variant and not the haplotype with the variant which suggests that it is a recurrent mutation.
Extended Data Fig. 10
Extended Data Fig. 10. Clues-inferred allele frequency trajectories.
Posterior mean of allele frequency trajectories inferred by Clues on the five variants tested (in red) and 20 random DAF-matched variants (in grey).

References

    1. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell177, 26–31 (2019). - PMC - PubMed
    1. Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med.28, 243–250 (2022). - PMC - PubMed
    1. Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet.19, 175–185 (2017). - PMC - PubMed
    1. Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med.375, 655–665 (2016). - PMC - PubMed
    1. Patel, A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat. Med.29, 1793–1803 (2023). - PMC - PubMed