Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 14;11(10):nwae326.
doi: 10.1093/nsr/nwae326. eCollection 2024 Oct.

Expanding the Russian allele frequency reference via cross-laboratory data integration: insights from 7452 exome samples

Affiliations

Expanding the Russian allele frequency reference via cross-laboratory data integration: insights from 7452 exome samples

Yury A Barbitoff et al. Natl Sci Rev. .

Abstract

Population allele frequency is crucially important for accurate interpretation of known and novel variants in medical genetics. Recently, several large allele frequency databases, such as the Genome Aggregation Database (gnomAD), have been created to serve as a global reference for such studies. However, frequencies of many rare alleles vary dramatically between populations, and population-specific allele frequency is often more informative than the global one. Many countries and regions, including Russia, remain poorly studied from the genetic perspective. Here, we report the first successful attempt to integrate genetic information between major medical genetic laboratories in Russia. We construct RUSeq, an open, large-scale reference set of genetic variants by analyzing 7452 exome samples collected in two major Russian cities-Moscow and St. Petersburg. An ∼10-fold increase in sample size compared to previous studies allowed us to characterize extensive genetic diversity within the admixed Russian population with contributions from several major ancestral groups. We highlight 51 known pathogenic variants that are overrepresented in Russia compared to other European countries. We also identify several dozen high-impact variants that are present in healthy donors despite being annotated as pathogenic in ClinVar and falling within genes associated with autosomal dominant disorders. The constructed database of genetic variant frequencies in Russia has been made available to the medical genetics community through a variant browser available at http://ruseq.ru.

Keywords: Russia; allele frequency; medical genetics; whole exome sequencing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Genetic diversity of individuals in the RUSeq data. Shown are the results of principal component analysis of the genotype data. Each point corresponds to an individual sample. On the left, separate subplots show the results for each participating lab. On the right, dots are colored according to the results of k-means clustering with different numbers of clusters (k). The middle plot shows the final clustering results used in subsequent analyses. REU, RSO, and RAS correspond to the three clusters (‘heel’, ‘ankle’, and ‘toes’).
Figure 2.
Figure 2.
The relationship between the admixed Russian population and major ancestral groups from the Human Genome Diversity Project (HGDP) and 1000 Genomes Project (1KGP). (a) Scatterplot showing samples from HGDP (top) or RUSeq (bottom) plotted in the principal component space built using genotypes of the 402 selected HGDP individuals with approximately even geographical representation (see Methods for details). For plots showing RUSeq individuals, positions of HGDP individuals are represented with gray dots. (b) A heatmap showing pairwise f2 and FST values (estimated using the admixtools2 package) between indicated pairs of populations. (c) Barplots showing the results of an ADMIXTURE analysis of unrelated individuals from RUSeq, HGDP, and 1KGP (K = 5). The following abbreviations are used for ancestry groups in 1KGP/HGDP: afr—African, amr—American, fin (f)—Finnish, eas—East Asian, mid (m)—Middle Eastern, nfe—non-Finnish European, sas—South Asian, oth—other. reu (RUSeq European), rso (RUSeq Southern), ras (RUSeq Asian) correspond to the three clusters of individuals in the RUSeq data.
Figure 3.
Figure 3.
Local mapping of RUSeq individual clusters to reference human populations. (a–c) Projection of individuals from RUSeq and selected populations from the 1000 Genomes project (1KGP) and the Human Genome Diversity Project (HGDP) into a principal component space built using (a) European individuals from 1KGP and HGDP, (b) individuals of European, Middle Eastern, and South Asian ancestry in HGDP, and (c) individuals of European, South Asian, and East Asian ancestry in the HGDP. Principal component analysis of the baseline individuals and projection of RUSeq individuals into the PC spaces was performed using smartpca based on a set of 4419 unlinked high-quality autosomal SNPs. On each subplot, individuals belonging to the indicated population are highlighted. HGDP individuals from subpopulations residing on the territory of Russia (Russian, Adygei (within the Southern European group), Yakut and Hezhen/Nanai (within the East Asian group)) are marked with triangles. The following abbreviations are used for ancestry groups in 1KGP/HGDP: afr—African, amr—American, ceu—Central and Western European, eas—East Asian, fin—Finnish, mid—Middle Eastern, nfe—non-Finnish European, sas—South Asian, rus—Russian individuals, seu—Southern European. The abbreviations reu, rso, ras correspond to the three clusters of individuals in the RUSeq data.

References

    1. Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nat Rev Genet 2018; 19: 253–68. 10.1038/nrg.2017.116 - DOI - PubMed
    1. Biesecker LG, Green RC. Diagnostic clinical genome and exome sequencing. N Engl J Med 2014; 370: 2418–25. 10.1056/NEJMra1312543 - DOI - PubMed
    1. Barbitoff YA, Ushakov MO, Lazareva TEet al. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Briefings Bioinf 2024; 25: bbad508. 10.1093/bib/bbad508 - DOI - PMC - PubMed
    1. Auton A, Abecasis GR, Steering committee et al. A global reference for human genetic variation. Nature 2015; 526: 68–74. - PMC - PubMed
    1. Lek M, Karczewski KJ, Minikel EVet al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016; 536: 285–91. 10.1038/nature19057 - DOI - PMC - PubMed

LinkOut - more resources