Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 19:10:1277.
doi: 10.3389/fgene.2019.01277. eCollection 2019.

Exonic Variants in Aging-Related Genes Are Predictive of Phenotypic Aging Status

Affiliations

Exonic Variants in Aging-Related Genes Are Predictive of Phenotypic Aging Status

Megan E Breitbach et al. Front Genet. .

Abstract

Background: Recent studies investigating longevity have revealed very few convincing genetic associations with increased lifespan. This is, in part, due to the complexity of biological aging, as well as the limited power of genome-wide association studies, which assay common single nucleotide polymorphisms (SNPs) and require several thousand subjects to achieve statistical significance. To overcome such barriers, we performed comprehensive DNA sequencing of a panel of 20 genes previously associated with phenotypic aging in a cohort of 200 individuals, half of whom were clinically defined by an "early aging" phenotype, and half of whom were clinically defined by a "late aging" phenotype based on age (65-75 years) and the ability to walk up a flight of stairs or walk for 15 min without resting. A validation cohort of 511 late agers was used to verify our results. Results: We found early agers were not enriched for more total variants in these 20 aging-related genes than late agers. Using machine learning methods, we identified the most predictive model of aging status, both in our discovery and validation cohorts, to be a random forest model incorporating damaging exon variants [Combined Annotation-Dependent Depletion (CADD) > 15]. The most heavily weighted variants in the model were within poly(ADP-ribose) polymerase 1 (PARP1) and excision repair cross complementation group 5 (ERCC5), both of which are involved in a canonical aging pathway, DNA damage repair. Conclusion: Overall, this study implemented a framework to apply machine learning to identify sequencing variants associated with complex phenotypes such as aging. While the small sample size making up our cohort inhibits our ability to make definitive conclusions about the ability of these genes to accurately predict aging, this study offers a unique method for exploring polygenic associations with complex phenotypes.

Keywords: aging; bioinformatics; genetics; machine learning; sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Logistic regression and variant burden reveal lack of association with early aging. (A) Quantile-quantile plot of logistic regression p-values. (B) Box plot of total number of variants in the discovery early aged group (red), discovery late ager group (blue), and the validation late ager group (purple). (C) Diagram of predictive modeling analysis study design.
Figure 2
Figure 2
Different subsets of variants defined as top predictive models using random forest and support vector machine (SVM) learning methods. (A) Boxplots of the random forest model area under the curve (AUCs) for the all variant, high Combined Annotation-Dependent Depletion (CADD) exon and control subsets of the variant data. P-values between groups determined by performing a Kruskal-Wallis test. **** = p < 0.0001, *** = p < 0.001. (B) Boxplots of the SVM model AUCs for the all variant, transcription factor binding site (TFBS), and control subsets of the variant data. P-values between groups determined by performing a Kruskal-Wallis test. (C) Receiver-operating characteristic (ROC) curve of the mean high CADD exon random forest model with confidence intervals. The red line represents the null AUC (0.5). (D) ROC curve of the mean TFBS SVM model with confidence intervals. The red line represents the null AUC (0.5).
Figure 3
Figure 3
The random forest high Combined Annotation-Dependent Depletion (CADD) exon model is predictive of late aging status in the validation cohort and outperforms smoking as a predictor of aging. (A) Boxplots of the fraction of misclassified patient samples based on the random forest high CADD exon model (magenta) and the control random forest model (shuffled dataset) (teal). (B) Receiver-operating characteristic curve of the mean area under the curve (AUC) resulting from the random forest high CADD exon model (black) with confidence intervals in the discovery cohort and the AUC resulting from smoking status as a sole predictor of early versus late aging (green).
Figure 4
Figure 4
Random forest high Combined Annotation-Dependent Depletion exon predictive variants are within 9 of the 20 genes and mostly non-synonymous. (A) Scatter plot of the Gini score for each of the predictive variants based on corresponding gene. (B) Bar plot of the variant consequence type within the predictors with corresponding empirical p-values.

Similar articles

Cited by

References

    1. Abellan van Kan G., Rolland Y., Bergman H., Morley J. E., Kritchevsky S. B., Vellas B. (2008). The I.A.N.A Task Force on frailty assessment of older people in clinical practice. J. Nutr. Health Aging 12, 29–37. 10.1007/BF02982161 - DOI - PubMed
    1. Astuti Y., Wardhana A., Watkins J., Wulaningsih W., PILAR Research Network (2017). Cigarette smoking and telomere length: a systematic review of 84 studies and meta-analysis. Environ. Res. 158, 480–489. 10.1016/j.envres.2017.06.038 - DOI - PMC - PubMed
    1. Aubert G., Lansdorp P. M. (2008). Telomeres and aging. Physiol. Rev. 88, 557–579. 10.1152/physrev.000262007 - DOI - PubMed
    1. Baas D. C., Despriet D. D., Gorgels T. G. M. F., Bergeron-Sawitzke J., Uitterlinden A. G., Hofman A., et al. (2010). The ERCC6 gene and age-related macular degeneration. PloS One 5, e13786. 10.1371/journal.pone.0013786 - DOI - PMC - PubMed
    1. Baker D. J., Jin F., Van Deursen J. M. (2008). The yin and yang of the Cdkn2a locus in senescence and aging. Cell Cycle 7, 2795–2802. 10.4161/cc.7.186687 - DOI - PMC - PubMed