Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;207(2):489-501.
doi: 10.1534/genetics.117.300198. Epub 2017 Aug 24.

Incorporating Gene Annotation into Genomic Prediction of Complex Phenotypes

Affiliations

Incorporating Gene Annotation into Genomic Prediction of Complex Phenotypes

Ning Gao et al. Genetics. 2017 Oct.

Abstract

Today, genomic prediction (GP) is an established technology in plant and animal breeding programs. Current standard methods are purely based on statistical considerations but do not make use of the abundant biological knowledge, which is easily available from public databases. Major questions that have to be answered before biological prior information can be used routinely in GP approaches are which types of information can be used, and at which points they can be incorporated into prediction methods. In this study, we propose a novel strategy to incorporate gene annotation into GP of complex phenotypes by defining haploblocks according to gene positions. Haplotype effects are then modeled as categorical or as numerical allele dosage variables. The underlying concept of this approach is to build the statistical model on variables representing the biologically functional units. We evaluate the new methods with data from a heterogeneous stock mouse population, the Drosophila Genetic Reference Panel (DGRP), and a rice breeding population from the Rice Diversity Panel. Our results show that using gene annotation to define haploblocks often leads to a comparable, but for some traits to a higher, predictive ability compared to SNP-based models or to haplotype models that do not use gene annotation information. Modeling gene interaction effects can further improve predictive ability. We also illustrate that the additional use of markers that have not been mapped to any gene in a second separate relatedness matrix does in many cases not lead to a relevant additional increase in predictive ability when the first matrix is based on haploblocks defined with gene annotation data, suggesting that intergenic markers only provide redundant information on the considered data sets. Therefore, gene annotation information seems to be appropriate to perceive the importance of DNA segments. Finally, we discuss the effects of gene annotation quality, marker density, and linkage disequilibrium on the performance of the new methods. To our knowledge, this is the first work that incorporates epistatic interaction or gene annotation into haplotype-based prediction approaches.

Keywords: GenPred; Shared Data Resources; categorical model; gene annotation; genomic selection; haplotype.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of the predictive ability of different models. Rows are different models and columns are traits from three data sets. For each trait, relative predictive ability is calculated by setting GBLUP as reference (mean accuracies divided by that of GBLUP). For the DGRP, only traits where gene-annotation-based models give extra predictive accuracy are presented. Trait “E2” of male lines in the DGRP data were also removed due to the extremely low predictive ability. W6W–W10W: body weight at 6 to 8 and 10 weeks; GSL: growth slope between 6 and 10 weeks of age; BMI, body mass index; BL, body length; %B220+, percentage of B220 cells; %CD3+, percentage of CD3 cells; %CD4+, percentage of CD4 cells; %CD8+, percentage of CD8 cells; %CD4+/CD3+, percentage of CD4 and CD3 cells; %CD8+/CD3+, percentage of CD8 and CD3 cells; CD4+/CD8+, ratio of CD4 to CD8 cells; CD4Intensity, CD4inCD3XGeoMean; CD8Intensity, CD8inCD3YGeoMean. F, female; M, male. DS, dry season; WS, wet season; PH, plant height; FLW, flower time; YLD, grain yield.
Figure 2
Figure 2
Error variance vs. predictive ability. Description of traits and models: see text and Figure 1.

References

    1. Abdollahi-Arpanahi R., Morota G., Valente B. D., Kranis A., Rosa G. J., et al. , 2016. Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens. Genet. Sel. Evol. 48: 10. - PMC - PubMed
    1. Albrecht T., Wimmer V., Auinger H.-J., Erbe M., Knaak C., et al. , 2011. Genome-based prediction of testcross values in maize. Theor. Appl. Genet. 123: 339–350. - PubMed
    1. Arya G. H., Magwire M. M., Huang W., Serrano-Negron Y. L., Mackay T. F. C., et al. , 2015. The genetic basis for variation in olfactory behavior in Drosophila melanogaster. Chem. Senses 40: 233–243. - PMC - PubMed
    1. Begum H., Spindel J. E., Lalusin A., Borromeo T., Gregorio G., et al. , 2015. Genome-wide association mapping for yield and other agronomic traits in an elite breeding population of tropical rice (Oryza sativa). PLoS One 10: e0119873. - PMC - PubMed
    1. Browning B. L., Browning S. R., 2008. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84: 210–223. - PMC - PubMed

Publication types

LinkOut - more resources