Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 25;15(11):e1008480.
doi: 10.1371/journal.pgen.1008480. eCollection 2019 Nov.

Increased ultra-rare variant load in an isolated Scottish population impacts exonic and regulatory regions

Affiliations

Increased ultra-rare variant load in an isolated Scottish population impacts exonic and regulatory regions

Mihail Halachev et al. PLoS Genet. .

Abstract

Human population isolates provide a snapshot of the impact of historical demographic processes on population genetics. Such data facilitate studies of the functional impact of rare sequence variants on biomedical phenotypes, as strong genetic drift can result in higher frequencies of variants that are otherwise rare. We present the first whole genome sequencing (WGS) study of the VIKING cohort, a representative collection of samples from the isolated Shetland population in northern Scotland, and explore how its genetic characteristics compare to a mainland Scottish population. Our analyses reveal the strong contributions played by the founder effect and genetic drift in shaping genomic variation in the VIKING cohort. About one tenth of all high-quality variants discovered are unique to the VIKING cohort or are seen at frequencies at least ten fold higher than in more cosmopolitan control populations. Multiple lines of evidence also suggest relaxation of purifying selection during the evolutionary history of the Shetland isolate. We demonstrate enrichment of ultra-rare VIKING variants in exonic regions and for the first time we also show that ultra-rare variants are enriched within regulatory regions, particularly promoters, suggesting that gene expression patterns may diverge relatively rapidly in human isolates.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Significant differences in variant load in coding and related regions for ultra-rare (upper panel) and very rare (lower panel) variants.
Circle dots represent the ratio of the median number of variants in a VIKING individual to the median number of variants in an LBC individual; whiskers are 95% CI based in 10,000 randomly selected LBC subsets (n = 269, with replacement). Significance: at least 95% of the 10,000 subsets have p-value ≤ 8x10-4 (Bonferroni corrected) and no overlap between the 95% CI for the LBC and the VIKING median values (for full results see S4 Fig). The higher variance in the 5’UTR and lower variance in ncRNA regions could be explained by their relatively small sizes– 9.3Mb and 7.3Mb, respectively.
Fig 2
Fig 2. Ultra-rare SNP variant loads in functionally annotated non-coding regions.
Circle dots represent the ratio of the median number of variants in a VIKING individual to the median number of variants in an LBC individual; whiskers are 95% CI based in 10,000 randomly selected LBC subsets (n = 269, with replacement). Significance: at least 95% of the 10,000 subsets have p ≤ 2x10-4 (Bonferroni corrected) and no overlap between the 95% CI for the LBC and the VIKING median values. The red vertical lines represent the median genome-wide load for ultra-rare SNPs and its 95% CI. The higher variance in the Insulator regions estimates could be explained by their relatively small size (17.4Mb). Gm12878: B-lymphoblastoid cells, H1hesc: embryonic stem cells, Hepg2: hepatocellular carcinoma cells, Hmec: mammary epithelial cells, Hsmm: skeletal muscle myoblasts, Huvec: umbilical vein endothelial cells, K562: erythrocytic leukemia cells, Nhek: normal epidermal keratinocytes, Nhlf: normal lung fibroblasts, union: an aggregated comparison between the two cohorts for this chromatin state by considering the union of state’s regions annotated in any of the 9 cell types.
Fig 3
Fig 3. Distribution of ultra-rare SNPs in functional regions.
Fraction of ultra-rare variants (FUV) = number of ultra-rare variants / (number of ultra-rare + known variants); Values for regulatory regions are computed as the average over the 9 cell types; non-coding = mappable genome– 5’UTR–exon–intron– 3’UTR–ncRNA; Coloured horizontal and vertical lines represent the genome-wide averages for the two cohorts. Dashed black lines represent the distribution shifts between LBC and VIKING for each of the considered genomic regions. A strictly vertical shift would indicate a proportional increase in the numbers of ultra-rare and known variants from LBC to VIKING, whereas a strictly horizontal shift (no change in the ultra-rare variant density between the two cohorts) would represent a decrease in the number of known variants in VIKING.
Fig 4
Fig 4. Allelic shift bias (ASB) suggests loss of constraint at VIKING exonic and promoter regions.
MAF shifts for very rare SNPs (MAFNFE ≤ 1%) between non-functional intergenic regions (NFIG), considered as baseline, and non-synonymous SNPs in exonic regions, SNPs with CADD score ≥ 10 in promoter regions and intronic SNPs, for each of the cohorts. These MAF differences are calculated using 1000 randomly selected LBC subsets of size 269 individuals (matching the VIKING size; with replacement) and considering only variants shared between the VIKING and the currently evaluated LBC subset, for which we computed the cohorts’ mean MAF in exonic, promoter, intronic and non-functional intergenic regions (see S10 Fig). Black horizontal lines represent mean values. The differences in MAF shifts in the two cohorts are statically significant for all three comparisons (p < 2.2x10-16, one-sided Wilcoxon rank sum test).

References

    1. Wright AF, Carothers AD, Pirastu M. Population choice in mapping genes for complex diseases. Nat Genet. 1999;23(4):397–404. 10.1038/70501 - DOI - PubMed
    1. Kristiansson K, Naukkarinen J, Peltonen L. Isolated populations and complex disease gene identification. Genome Biol. 2008;9(8):109 10.1186/gb-2008-9-8-109 - DOI - PMC - PubMed
    1. Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. PLoS One. 2010;5(11):e13996 10.1371/journal.pone.0013996 - DOI - PMC - PubMed
    1. Hatzikotoulas K, Gilly A, Zeggini E. Using population isolates in genetic association studies. Brief Funct Genomics. 2014;13(5):371–7. 10.1093/bfgp/elu022 - DOI - PMC - PubMed
    1. Zeggini E. Using genetically isolated populations to understand the genomic basis of disease. Genome Med. 2014;6(10):83 10.1186/s13073-014-0083-5 - DOI - PMC - PubMed

Publication types