Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 3:2024.06.01.596951.
doi: 10.1101/2024.06.01.596951.

Comparing statistical learning methods for complex trait prediction from gene expression

Affiliations

Comparing statistical learning methods for complex trait prediction from gene expression

Noah Klimkowski Arango et al. bioRxiv. .

Update in

Abstract

Accurate prediction of complex traits is an important task in quantitative genetics that has become increasingly relevant for personalized medicine. Genotypes have traditionally been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regressions, dimension reductions, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were used in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effect sizes - ranging from models that assume that every gene affects the trait to more sparse models - and their ability to capture gene-gene interactions. We also used functional annotation (i.e., Gene Ontology (GO)) as an external source of biological information to inform prediction models. The results show that differences in prediction accuracy between methods exist, although they are generally not large. Methods performing variable selection gave higher accuracy in females while methods assuming a more polygenic architecture performed better in males. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms with different genes emerging between sexes. Notably, the Insulin-like Receptor (InR) was prevalent across methods and sexes. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
Prediction accuracy of 25 replicates in A) females and B) males for all standard methods. Methods are colored by family. The mean correlation coefficient is denoted by diamonds. Outliers are denoted by circles.
Fig 2.
Fig 2.
Prediction accuracy using GO-BayesC in females (A) and males (B). Prediction accuracy using GO-TBLUP in females (C) and males (D). Each dot represents the mean correlation between true and predicted phenotypes (r) across 25 replicates for a GO term. The solid line indicates the mean r from the respective standard method (i.e., BayesC and TBLUP). The dashed black line represents the 99th percentile of terms ranked by prediction accuracy.
Fig 3.
Fig 3.
Plot of prediction accuracy for all GO terms using GO-BayesC(x-axis) against GO-TBLUP (y-axis) for A) females and B) males. The black line represents the line of best fit for each panel.

References

    1. Meuwissen TH, Hayes BJ, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. genetics. 2001;157(4):1819–1829. - PMC - PubMed
    1. Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans: genomic prediction. Genetics. 2019;211(4):1131–1141. - PMC - PubMed
    1. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748–752. - PMC - PubMed
    1. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome medicine. 2020;12(1):44. - PMC - PubMed
    1. de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–345. - PMC - PubMed

Publication types