Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jul;57(7):1588-1597.
doi: 10.1038/s41588-025-02212-3. Epub 2025 Jun 23.

Using large-scale population-based data to improve disease risk assessment of clinical variants

Affiliations
Review

Using large-scale population-based data to improve disease risk assessment of clinical variants

Iain S Forrest et al. Nat Genet. 2025 Jul.

Abstract

Understanding the disease risk of genetic variants is fundamental to precision medicine. Estimates of penetrance-the probability of disease for individuals with a variant allele-rely on disease-specific cohorts, clinical testing and emerging electronic health record (EHR)-linked biobanks. These data sources, while valuable, each have limitations in quality, representativeness and analyzability. Here, we provide a historical account of the currently accepted pathogenicity classification system and data available in ClinVar, a public archive that aggregates variant interpretations but lacks detailed data for accurate penetrance assessment, highlighting its oversimplification of disease risk. We propose an integrative Bayesian framework that unifies pathogenicity and penetrance, leveraging both functional and real-world evidence to refine risk predictions. In addition, we advocate for enhancing ClinVar with the inclusion of high-priority phenotypes, age-stratified data and population-based cohorts linked to EHRs. We suggest developing a community repository of population-based penetrance estimates to support the clinical application of genetic data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: R.D. reported being a scientific cofounder, consultant and equity holder for Pensieve Health (pending) and a consultant for Variant Bio and Character Bio. J.M.E. reported being a cofounder, board member and executive of the nonprofit Center for Genomic Interpretation, with part of its mission overlapping with the interests of this work, specifically the mission to encourage careful stewardship of clinical genetics. J.M.E. is also the founder of and a consultant for Grandview Consulting LLC, not related to this work. K.-L.H. is a founder of Open Box Science, not related to this work. W.K.C. is on the Board of Directors of Prime Medicine and Rallybio, not related to this work. The other authors declare no competing interests.

Figures

Fig. 1.
Fig. 1.. Development of a variant interpretation framework from pathogenicity to population-based disease risk using a real-world example of BRCA2 c. 68–7T > A.
A. Variant interpretation based on two frameworks: pre-population-based classification whereby variants are traditionally assigned pathogenicity categorization based on the presence of disease in families or clinically ascertained cohorts enriched for disease (top), versus post-population-based classification in which the association of variants with disease are observed in large-scale cohorts or biobanks (bottom). As an example, the BRCA2 c.68–7T > A variant was initially categorized as P in a pre-population-based classification due to its apparent segregation with breast cancer in several members of a family; however, subsequent analyses in a post-population-based classification revealed a high prevalence in healthy individuals from two cohorts with no association with disease. This led to its eventual reclassification as a VUS and then B variant (only B reclassification is shown for simplicity). Revised reports were sent to physicians caring for patients with the variant to remove them from intensive screening, clinical trials, or treatment intended for individuals with P variants. B. Two separate but complementary axes of pathogenicity categories versus population-based spectrum of disease risk demonstrate the dynamic and evolving nature of variant interpretation. A hypothetical distribution of variants is depicted as points across three categories of pathogenicity of B (green), VUS (yellow), and P (red). As an example, BRCA2 c.68–7T > A was downgraded from P to B classification based on its absence of disease risk observed in population-based analyses. Created with BioRender.com. P or LP, pathogenic or likely pathogenic; VUS, variant of uncertain significance; B, benign; ExAC, Exome Aggregation Consortium; 1kGP, 1000 Genomes Project.

Similar articles

References

    1. Turro E et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020). - PMC - PubMed
    1. Smedley D et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report. N. Engl. J. Med. 385, 1868–1880 (2021). - PMC - PubMed
    1. Nisar H et al. Whole-genome sequencing as a first-tier diagnostic framework for rare genetic diseases. Exp. Biol. Med. (Maywood). 246, 2610–2617 (2021). - PMC - PubMed
    1. CDC. Tier 1 Genomics Applications and their Importance to Public Health. Office of Genomics and Precision Public Health https://www.cdc.gov/genomics/implementation/toolkit/tier1.htm (2014).
    1. Sturm AC et al. Clinical Genetic Testing for Familial Hypercholesterolemia: JACC Scientific Expert Panel. J. Am. Coll. Cardiol. 72, 662–680 (2018). - PubMed

LinkOut - more resources