Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Sep 1;109(9):1582-1590.
doi: 10.1016/j.ajhg.2022.07.008.

Social and scientific motivations to move beyond groups in allele frequencies: The TOPMed experience

Affiliations
Review

Social and scientific motivations to move beyond groups in allele frequencies: The TOPMed experience

Sarah C Nelson et al. Am J Hum Genet. .

Abstract

For the genomics community, allele frequencies within defined groups (or "strata") are useful across multiple research and clinical contexts. Benefits include allowing researchers to identify populations for replication or "look up" studies, enabling researchers to compare population-specific frequencies to validate findings, and facilitating assessment of variant pathogenicity in clinical contexts. However, there are potential concerns with stratified allele frequencies. These include potential re-identification (determining whether or not an individual participated in a given research study based on allele frequencies and individual-level genetic data), harm from associating stigmatizing variants with specific groups, potential reification of race as a biological rather than a socio-political category, and whether presenting stratified frequencies-and the downstream applications that this presentation enables-is consistent with participants' informed consents. The NHLBI Trans-Omics for Precision Medicine (TOPMed) program considered the scientific and social implications of different approaches for adding stratified frequencies to the TOPMed BRAVO (Browse All Variants Online) variant server. We recommend a novel approach of presenting ancestry-specific allele frequencies using a statistical method based upon local genetic ancestry inference. Notably, this approach does not require grouping individuals by either predominant global ancestry or race/ethnicity and, therefore, mitigates re-identification and other concerns as the mixture distribution of ancestral allele frequencies varies across the genome. Here we describe our considerations and approach, which can assist other genomics research programs facing similar issues of how to define and present stratified frequencies in publicly available variant databases.

Keywords: allele frequencies; anti-racism; genetic ancestry; stratification.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Visualization of approaches for computing stratified allele frequencies This figure visualizes potential types of allele frequency stratification and their challenges by demonstrating admixture. Local ancestry patterns were simulated at random to generate admixed genomes, and each interval is colored by its sampled ancestry. Approach 1 for computing stratified allele frequencies relies on groupings of individuals based on self-reported race/ethnicity (top right, independent of inferred admixture patterns) or on identification and grouping of individuals whose genomes are mostly from a specific inferred ancestry (bottom right). Individuals may be excluded from grouping approaches due to missing race/ethnicity or high admixture. Approach 2 uses all individuals and relies on local ancestry inferences to compute ancestry-specific allele frequencies across the genomes. The use of the plural terms to describe continental ancestries (e.g., “European ancestries”) emphasizes the fact that any selected ancestry is a reflection of a somewhat arbitrary reference population, encompassing a set of finer-scaled ancestries.

References

    1. Sherry S.T., Ward M.H., Kholodov M., Baker J., Phan L., Smigielski E.M., Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. - PMC - PubMed
    1. Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141, 456 humans. Nature. 2020;581:434–443. - PMC - PubMed
    1. Li X., Li Z., Zhou H., Gaynor S.M., Liu Y., Chen H., Sun R., Dey R., Arnett D.K., Aslibekyan S., et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 2020;52:969–983. - PMC - PubMed
    1. Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A.G., Corvelo A., Gogarten S.M., Kang H.M., et al. Sequencing of 53, 831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590:290–299. - PMC - PubMed
    1. Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet. Med. 2015;17:405–424. - PMC - PubMed

Publication types

LinkOut - more resources