Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 4:5:17788.
doi: 10.1038/srep17788.

A novel strategy for forensic age prediction by DNA methylation and support vector regression model

Affiliations

A novel strategy for forensic age prediction by DNA methylation and support vector regression model

Cheng Xu et al. Sci Rep. .

Abstract

High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R(2) > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20-80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Age-associated DNA methylation sites in human blood of twins, as detected by Illumina Human Methlation540 BeadChip.
(a) A heatmap of 2,965 age-associated methylation markers selected from eight pairs of female twins under the criteria of P < 0.01 and R2 > 0.5. The age-associated markers clustered into positive (n = 1,476, the top block) and negative (n = 1,489, the bottom block) correlated markers. The methylation values of each probe were normalized among 16 female samples, which were indicated in blue (low) to yellow (high). The ages for 16 females are shown at the top of the heatmap. (b) Scatter plots of methylation value versus age for 11 strongly age-associated DNA methylation sites under more stringent criteria of FDR < 0.01. Out of these, six CpG sites were positive associated with age and five CpG sites were negatively associated with age.
Figure 2
Figure 2. Validation of age-associated methylation sites by using Sequenom MassARRAY in 50 healthy females.
(a) The reliability of the Sequenom MassARRAY output data. The confidence of methylation values for each product of primer per sample was assigned to a value referring to low (0) to high confidence (5). This value > 1.9 showed that the methylation level can be accepted. Our data from Sequenom MassARRAY is of high quality since 95% values were accepted. (b) Scatter plots of the methylation level as a function of age for 11 CpG sites that were selected from the Sequenom MassARRAY result at |R| < 0.5. The ID of CpG sites and their R values are shown in the right top corner of each sub-figure.
Figure 3
Figure 3. Age prediction using four models.
(a) Multivariate linear regression model. (b) Multivariate nonlinear regression model. (c) Back propagation neural network model. (d) Support vector regression model. Using 11 CpG sites selected from Sequenom MassARRAY results in 49 females, the mean absolute deviation for each method was 6.4, 4.1, 3.9, and 2 years, respectively.
Figure 4
Figure 4. SVR is superior to linear regression in age prediction.
(a) The minimal MAD of predicted age as a function of the number of sites that compose the independent variables. The 11 CpG sites selected from the Sequenom MassARRAY dataset were combined to one to 11 independent variables. SVR model fit on all but one sample, and the minimal MAD of the predicted age was observed for a given number of independent variables. (b) Predicted versus observed age of all 49 subjects, using SVR model by six markers. MAD of 2.8 years was observed, which is slightly higher than that obtained by 11 markers. (c) Predicted versus observed age using multivariate linear regression by three DNA methylation markers obtained from a recent study. The original BeadChip data of these three sites were extracted to predict age by using a multivariate linear regression model, and an MAD of 6.27 years was obtained. (d) Predicted versus observed age using SVR by three DNA methylation markers obtained from a recent study. MAD of 4.23 years was obtained, which is better than the MAD obtained when using a multivariate linear regression (panel C), and better than the MAD obtained when using a multivariate linear regression based on pyrosequencing data in the published study (5.4 years).

References

    1. Franklin D. Forensic age estimation in human skeletal remains: current concepts and future directions. Leg Med (Tokyo) 12, 1–7 (2010). - PubMed
    1. Vidaki A., Daniel B. & Court D. S. Forensic DNA methylation profiling--potential opportunities and challenges. Forensic Sci Int Genet 7, 499–507 (2013). - PubMed
    1. Pilin A., Pudil F. & Bencko V. Changes in colour of different human tissues as a marker of age. Int J Legal Med 121, 158–62 (2007). - PubMed
    1. Alvarez M. & Ballantyne J. The identification of newborns using messenger RNA profiling analysis. Anal Biochem 357, 21–34 (2006). - PubMed
    1. Zubakov D. et al. Estimating human age from T-cell DNA rearrangements. Curr Biol 20, R970–1 (2010). - PubMed

Publication types

Substances