Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 4;15(6):jkaf088.
doi: 10.1093/g3journal/jkaf088.

Allele age estimators designed for whole-genome datasets show only a moderate reduction in performance when applied to whole-exome datasets

Affiliations

Allele age estimators designed for whole-genome datasets show only a moderate reduction in performance when applied to whole-exome datasets

Alyssa Pivirotto et al. G3 (Bethesda). .

Abstract

As personalized genomics becomes more affordable, larger numbers of rare variants are being discovered, leading to important initiatives in identifying the functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as those implemented in the programs Relate, Genealogical Estimator of Variant Age, and Runtc were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model, as well as under population expansion and background selection models. We found that each provides usable estimates of allele age from whole-exome datasets. Relate performs the best amongst all 3 estimators with Pearson coefficients of 0.83 and 0.73 (with respect to true simulated values for neutral constant and expansion population models, respectively) with a 12% and 20% decrease in correlation between whole-genome and whole-exome estimations. Of the 3 estimators, Relate is best able to parallelize to yield quick results with little resources; however, Relate is currently only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods show a modest decrease in performance in the estimation of the age of mutations.

Keywords: WES; allele age; exome; personalized genomics; whole-exome sequencing.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: The author(s) declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Estimator comparison on WES and WGS data simulated under a simple model. The estimated age of mutations was compared with true age values from variants simulated under a simple constant population size model. Values are reported estimates from WES datasets for Relate a), GEVA b), and time of coalescence c) and from the WGS datasets for Relate d), GEVA e), and time of coalescence f). Results are plotted on a log scale with the dotted black line representing perfect recapitulation of true age values. Points are colored by density calculated by a Gaussian density gradient. Pearson's r, Spearman's ρ, RMSLE, and Bias are reported for each comparison.
Fig. 2.
Fig. 2.
Estimator comparison on WES and WGS data simulated under a complex model. The estimated age of mutations was compared with true age values from variants simulated under a complex population expansion model. Values are reported estimates from WES datasets for Relate a), GEVA b), and time of coalescence c), and from the WGS datasets for Relate d), GEVA e), and time of coalescence f). Results are plotted on a log scale with the dotted black line representing perfect recapitulation of true age values. Points are colored by density calculated by a Gaussian density gradient. Pearson's r, Spearman's ρ, RMSLE, and Bias are reported for each comparison.
Fig. 3.
Fig. 3.
Correlation between true and estimated allele age increases with sample size. Allele ages are estimated from samples of 100; 500; 1,000; 2,500; 5,000; 7,500; 10,000; and 15,000 genomes from simulations of constant population size (simple) and population expansion model (complex). The true age of mutations is compared with the estimated age of mutations estimated with Relate for each of the 3 sampled set of mutations.
Fig. 4.
Fig. 4.
Relate allele age estimates compared with true values under background selection. The allele age of mutations from simulations of sampled genomes under a model which incorporates background selection is estimated using Relate. a) Whole-exome, simple model. b) Whole-exome, complex model. c) Whole-genome, simple model. d) Whole-genome, complex model.
Fig. 5.
Fig. 5.
Error in estimates across spectrum of frequency values for the simple model. For each estimator, sites generated under the simple model are binned by 1% and an average RMSLE for that bin was normalized by the average true mutation age of that bin. a) RMSE across all frequency bins. b) RMSLE for frequencies below 1% for allele counts from 1 to 72.

Update of

Similar articles

References

    1. Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, et al. 2020. A community-maintained standard library of population genetic models. eLife. 9:e54967. doi:10.7554/eLife.54967. - DOI - PMC - PubMed
    1. Albers PK, McVean G. 2020. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 18(1):e3000586. doi:10.1371/journal.pbio.3000586. - DOI - PMC - PubMed
    1. All of Us Research Program Investigators; Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, Dishman E. 2019. The “All of Us” research program. N Engl J Med. 381(7):668–676. doi:10.1056/NEJMsr1809937. - DOI - PMC - PubMed
    1. Almogy G, Pratt M, Oberstrass F, Lee L, Mazur D, Beckett N, Barad O, Soifer I, Perelman E, Etzioni Y, et al. 2022. Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. BioRvix 2022.05.29.493900. doi:10.1101/2022.05.29.493900. https://www.biorxiv.org/content/10.1101/2022.05.29.493900v4. - DOI - DOI
    1. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, et al. 2015. A global reference for human genetic variation. Nature. 526(7571):68–74. doi:10.1038/nature15393. - DOI - PMC - PubMed

LinkOut - more resources