Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 16:2025.05.14.25327536.
doi: 10.1101/2025.05.14.25327536.

A multi-ancestry genetic reference for the Quebec population

Affiliations

A multi-ancestry genetic reference for the Quebec population

Peyton McClelland et al. medRxiv. .

Abstract

While international efforts have characterized genetic variation in millions of individuals, the interplay of environmental, social, cultural, and genetic factors is poorly understood for most worldwide populations. The province of Quebec in Canada has been the site of numerous genetic studies, often focusing on individual Mendelian diseases in founder sub-populations. Here, we profiled and analyzed genome-wide genotyped variation in 29,337 Quebec residents from the large population-based cohort CARTaGENE (CaG), including rich phenotype and environmental data. We also sequenced the whole-genome of 2,173 CaG participants, including 163 and 132 individuals with grandparents born in Haiti and Morocco, respectively. We use this genetic information to gain insight into Quebec's demography and to help interpret the potential significance of variants identified in clinically important genes. We built an imputation panel by phasing the CaG whole-genome sequence data and showed, using genome-wide association studies (GWAS), how it improves the discovery of phenotype-genotype associations in this population. We provide allele frequency information and GWAS results through dedicated and publicly available websites. The genetic data, paired with phenotypic and environmental information, is also available for research use upon scientific and ethical review.

Keywords: CARTaGENE; Quebec; SPG7; founder population; genotype imputation.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS The authors declare that they have no competing interests.

Figures

Figure 1.
Figure 1.. Genetic structure in the CaG population-based cohort of Quebec.
(A) Principal components 1 and 2, labelled and colored by HDBSCAN clusters, as described in Methods. Word clouds of the countries of birth are provided for each cluster. Global population structure within the cohort is shaped by a combination of domestic structure and recent migration. (B) UMAP of the top 10 principal components of CaG, labelled and colored by the same HDBSCAN clusters as in A. Clusters reflect recent migration history, admixture, and domestic genetic structure within Quebec. Some clusters, such as 10 and 12, are not evident in the 2D UMAP but form in the higher dimensional UMAP used in HDBSCAN. Word clouds for other variables are available in Supplementary Fig. 1. (C-D) Estimation of admixture proportions among CaG participants whose four grandparents were born in Morocco (QMO, C) or Haiti (QHA, D). ADMIXTURE estimation was run independently for each group for varied number K of components (Methods). Reported Ks were chosen by cross-validation analysis (Supplementary Fig. 4–5 for all Ks). The plots show CaG individuals with four grandparents born in same country, and three additional populations from the 1000 Genomes Project as reference in (D): YRI: Yoruba in Nigeria, CHB: Han Chinese in Beijing, China, ACB: African Caribbean in Barbados.
Figure 2.
Figure 2.. Analysis of rare pathogenic variants in SPG7.
Structure of the SPG7 gene, highlighting its main three domains and the location of nine variants found in the CaG cohort. For each variant, amino acid/nucleotide change, allele count (AC), frequency in Quebec residents of French-Canadian ancestry (QFC), and estimated mutation age (TMRCA) are displayed. The heatmaps show the carrier frequency of three SPG7 variants computed with ISGen in 24 historical regions of Quebec. The scale is given in number of carriers per 1000 individuals. Each variant has a different frequency distribution across Quebec. Highest frequency regions for each variant are highlighted: C.988–1G>A (Saguenay and Charlevoix), p.Gly349Ser (Côte-de-Beaupré and Portneuf) and p.Asp765Asn (Bas-Saint-Laurent and Côte-du-Sud).
Figure 3.
Figure 3.. Genome-wide statistically significant loci identified using different genotype imputation approaches in individuals of genetic European ancestry.
The statistically significant independent locus was defined as the ±500-kb region around the variant with the lowest statistically significant P-value, referred to as the lead variant. Any overlapping loci for the same trait were merged into a single locus, keeping one lead variant with the lowest P-value. Thus, by definition, each locus had only one lead variant. (A) The overlap between genome-wide statistically significant loci using three imputation approaches. Letters A, B, C, D, E, and F label the corresponding loci subsets. The number below the label shows the number of loci within each subset. The number in brackets corresponds to the number of lead variants not found in the alternative reference panel. (B) The X-axis shows the median of paired differences between imputation qualities at lead variants in CaG and TOPMed imputation results. The Y axis shows the median alternate allele frequency (AF) of lead variants based on CaG imputation results. The horizontal and vertical error bars show 95% confidence intervals after 1,000,000 permutations stratified by AF. Only statistically significant (significance threshold 0.025) two-tailed permutation P-values for the median of paired differences in imputation qualities are displayed below the subset labels to reduce image cluttering. (C) Each panel shows a fold change in the proportion of lead variants inside different subsets of significant loci, from left to right: lead variants for which alternate allele frequency (AF) in QFC individuals exceeds AF in gnomAD NFE, lead variants for which AF in QFC individuals is 2 times higher compared to gnomAD NFE, lead variants for which AF in QFC individuals is 4 times higher compared to gnomAD NFE. The P-values were computed as the proportion of 1,000,000 permuted samples that exceeded the observed fold change with a statistical significance threshold of 0.05.

References

    1. Charbonneau H., Desjardins B., Légaré J. & Denis H. The population of the St-Lawrence Valley, 1608–1760. A Population History of North America, 99–142 (2000).
    1. Bouchard G. & De Braekeleer M. Histoire d’un génome. Population et génétique de l’est du Québec. (Presses de l’Université du Québec, 1991).
    1. Laberge A. M. et al. Population history and its impact on medical genetics in Quebec. Clin Genet 68, 287–301 (2005). 10.1111/j.1399-0004.2005.00497.x - DOI - PubMed
    1. Cruz Marino T. et al. Portrait of autosomal recessive diseases in the French-Canadian founder population of Saguenay-Lac-Saint-Jean. Am J Med Genet A 191, 1145–1163 (2023). 10.1002/ajmg.a.63147 - DOI - PubMed
    1. Anderson-Trocme L. et al. On the genes, genealogies, and geographies of Quebec. Science 380, 849–855 (2023). 10.1126/science.add5300 - DOI - PubMed

Publication types

LinkOut - more resources