Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 6:2024.02.22.581566.
doi: 10.1101/2024.02.22.581566.

Predicting the direction of phenotypic difference

Affiliations

Predicting the direction of phenotypic difference

David Gokhman et al. bioRxiv. .

Update in

Abstract

Predicting phenotypes from genomic data is a key goal in genetics, but for most complex phenotypes, predictions are hampered by incomplete genotype-to-phenotype mapping. Here, we describe a more attainable approach than quantitative predictions, which is aimed at qualitatively predicting phenotypic differences. Despite incomplete genotype-to-phenotype mapping, we show that it is relatively easy to determine which of two individuals has a greater phenotypic value. This question is central in many scenarios, e.g., comparing disease risk between individuals, the yield of crop strains, or the anatomy of extinct vs extant species. To evaluate prediction accuracy, i.e., the probability that the individual with the greater predicted phenotype indeed has a greater phenotypic value, we developed an estimator of the ratio between known and unknown effects on the phenotype. We evaluated prediction accuracy using human data from tens of thousands of individuals from either the same family or the same population, as well as data from different species. We found that, in many cases, even when only a small fraction of the loci affecting a phenotype is known, the individual with the greater phenotypic value can be identified with over 90% accuracy. Our approach also circumvents some of the limitations in transferring genetic association results across populations. Overall, we introduce an approach that enables accurate predictions of key information on phenotypes - the direction of phenotypic difference - and suggest that more phenotypic information can be extracted from genomic data than previously appreciated.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Schematic of the approach to predict the direction of phenotypic difference. (a) We start with a phenotyped individual and an unphenotyped individual. We consider the known and unknown effects contributing to (or associated with) the phenotype of interest. Known genetic effects on the phenotypic difference are in blue (measured in units of the phenotype), unknown genetic and non-genetic effects are in yellow. Cases where the contribution is identical between the two individuals (and therefore do not affect the phenotypic difference) are in gray. (b) Only the known divergent effects are used to predict the phenotypic difference between the individuals. The sum of the known effects can be thought of as the final position of a random walk with step sizes and directions corresponding to the effect sizes. (c) The direction of the total sum of the known effects is used to make a prediction of the direction of phenotypic difference between the phenotyped and unphenotyped individuals. If the sum of the known effects between the individuals is positive, we predict that the phenotypic value of the unphenotyped individual is larger than the phenotyped individual (and the opposite prediction if the sum is negative). (d) Modeling prediction accuracy using random walks. The curves represent random walks where each step is an effect size. The blue curve shows the known effects of a specific random walk, and the sign (positive or negative) of the blue point at the end of the walk is the predicted direction of phenotypic difference. The yellow curves show potential random walks of the unknown effects (genetic and environmental). In this example, effect sizes were drawn from a standard normal distribution. For a correct prediction of the direction of the phenotypic difference, the sum of the known effects (blue point) and the true phenotypic difference (yellow dot) need to be on the same side of the x-axis (both below or both above).
Figure 2:
Figure 2:
Evaluating prediction accuracy using the known-to-unknown ratio (κ). (a) Simulated prediction accuracies for various κ values (grouped into equally spaced bins), for different proportions of the known vs. unknown effects (10%, 50%, and 90% of effects known). Effect sizes were drawn from a normal distribution. In gray is the theoretical expectation from Eq. 4. (b) The distribution of κ values for the case where the known effects are randomly sampled. The vertical line denotes the κ values required for prediction accuracy of P>0.95(κ=0.62) (c). The distribution of κ values for the case where the known effects are those with the largest effect sizes. The vertical line denote the κ values required for prediction accuracy of P>0.95. In all panels, 10,000 effect sizes were drawn from a standard normal distribution to represent the known and unknown effects on the phenotype.
Figure 3:
Figure 3:
Predictions of the direction of phenotypic difference in humans. (a)–(c) The relationship between the known-to-unknown ratio (κ) and the proportion of correct predictions in different phenotypes. The theoretical expectation (Eq. 4) is shown in gray. (a) Pairwise comparisons of siblings from the UK Biobank for six phenotypes. (b) Pairwise comparisons of individuals from the European group (self-identified White British with Northwestern European genetic ancestry) from the UK Biobank for the same six phenotypes. (c) Pairwise height comparisons of individuals from the same population (either European, East Asian or African, as defined in Fig. S6), using GWAS generated from a European-ancestry group in Yengo et al. (15). (d)–(f) The distribution of κ values for all pairwise comparisons. Each panel corresponds to the panel above it.
Figure 4:
Figure 4:
The effect of directional selection on predicting the direction of phenotypic difference. (a) Prediction accuracy under directional selection, modeled as a biased random walk. The random walks in this schematic are biased toward the positive direction, with larger effects having a stronger bias. Biased random walks increase prediction accuracy. (b) Prediction accuracy for different κ values and different levels of bias, with 50% randomly selected known effects out of 10,000 overall. (c) Prediction accuracy across species. Each point represents the proportion of correct predictions. The number of phenotypes is noted above each data point. For sticklebacks, between 14 and 27 phenotypic predictions were made for four different freshwater populations. For mice, predictions were made for two phenotypes in 16 developmental stages.

Similar articles

References

    1. Rosenberg N. A., Edge M. D., Pritchard J. K. & Feldman M. W. Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. et al. 2019, 26–34 (2019). - PMC - PubMed
    1. Young A. I., Benonisdottir S., Przeworski M. & Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019). - PMC - PubMed
    1. Dittmar E. L., Oakley C. G., Conner J. K., Gould B. A. & Schemske D. W. Factors influencing the effect size distribution of adaptive substitutions. Proceedings of the Royal Society B: Biological Sciences 283, 20153065 (2016). - PMC - PubMed
    1. Orr H. A. The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6, 119–127 (2005). - PubMed
    1. Scheben A. & Edwards D. Towards a more predictable plant breeding pipeline with CRISPR/Cas-induced allelic series to optimize quantitative and qualitative traits. Current Opinion in Plant Biology 45, 218–225 (2018). - PubMed

Publication types