Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun 28:4:225-35.
doi: 10.4137/ebo.s756.

Estimation of genetic effects and genotype-phenotype maps

Affiliations

Estimation of genetic effects and genotype-phenotype maps

Arnaud Le Rouzic et al. Evol Bioinform Online. .

Abstract

Determining the genetic architecture of complex traits is a necessary step to understand phenotypic changes in natural, experimental and domestic populations. However, this is still a major challenge for modern genetics, since the estimation of genetic effects tends to be complicated by genetic interactions, which lead to changes in the effect of allelic substitutions depending on the genetic background. Recent progress in statistical tools aiming to describe and quantify genetic effects meaningfully improves the efficiency and the availability of genotype-to-phenotype mapping methods. In this contribution, we facilitate the practical use of the recently published 'NOIA' quantitative framework by providing an implementation of linear and multilinear regressions, change of reference operation and genotype-to-phenotype mapping in a package ('noia') for the software R, and we discuss theoretical and practical benefits evolutionary and quantitative geneticists may find in using proper modeling strategies to quantify the effects of genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Illustration of data formatting.
Part a provides an example of a data set in which the genotypes of individals are fully known (or, alternatively, totally unknown and considered as missing data); 1 and 3 stand for the homozygotes (e.g. ‘AA’ and ‘aa’) and 2 for the hererozygote. Part b illustrates a second kind of data set in which the genotypes are defined by their probabilites. In this example, part b is the exact equivalent of part a (and then, the frequency of the ‘known’ genotypes is always 1), but in practice, especially when the data result from a Haley-Knott regression, the probabilities, computed from the genotypes at flanking markers, may be intermediate. Missing values (‘NA’) are allowed in type a data sets, and are replaced by genotypic probabilities equal to genotypic frequencies in the rest of the population (here, close to 0.25, 0.5, and 0.25 since the population is an F2). The Z matrix used for the regression (equation 5) is computed from a ‘type b’ data set, meaning that if ‘type a’ data is provided, it is turned into ‘type b’ before the genetic regression.
Figure 2
Figure 2. Accuracy of GP map predictions.
The estimate of genotypic values, as well as their 95% confidence intervals, are shown for two different tow-locus Genotype-Phenotype maps (a: no epistasis, b: multilinear epistasis). Results are derived from simulated F2 populations of size N = 200 (the script is provided in the Appendix). Predictions are satisfactory, except if the model cannot handle the complexity of the map (marginal effect model on an epistatic map). Confidence intervals are smaller when the genotypic value is estimated from a frequent genotype in the population (the most frequent genotype in an F2 being 22), and when the model has less degrees of freedom (such as in one-locus models). 95% confidence intervals are estimated from the standard error (SE) by CI = 1.96 × SE.
Figure 3
Figure 3. Impact of the quality of the data set on the results.
The effect of the population size and the proportion of missing data on the quality of the results is illustrated by the standard deviation of the 2-locus GP map estimates. The amplitude of uncertainties changes with the genotype considered, since the more frequent in the F2 population, the better the estimate of the genotypic value. The results for the ‘best’ genotype (i.e. the fully heterozygous (‘htz’) genotype 22) and one of the the ‘worse’ ones (fully homozygous (‘hmz’) 11) are displayed. a: improvement in the precision of the GP map when the size of the population under study is increased. b: effects of substituting (randomly) genotypic information (2 loci, N = 500) by missing data. In this example (Var(e) = 1, additive GP map), fairly good estimates of the genotypic values in a 2-locus GP map requires N > 400, and these estimates appear to be quite robust to missing data information. The corresponding script is available in the Appendix.
Figure 4
Figure 4. Computational resource requirements.
The complexity of the models increases with the number of loci. a) presents the time necessary for the linear regression, with full and marginal-effect models. The test has been performed on a single AMD Athlon 4000 + processor, with the standard R software for Linux (32 bits) and its profiling module (Rprof). Multilinear regression (not shown) is always slower than the corresponding linear regression since this linear regression is first performed to estimate the starting values. b) Increase of the S matrix size with the number of loci. S matrix is the largest element in the model, and its size is proportional to the memory necessary to run the program. With a modern desktop PC, it is possible to run regressions up to 10 loci, which is probably beyond the number of genes that can be located in a regular experimental procedure.
Figure 5
Figure 5. Illustration of the consequences of reducing the complexity of GP maps.
An F2 population (size N = 500, Var (e) = 1) has been simulated from an arbitrary 2-locus, 2-allele (a and A at the first locus, b and B at the other one) GP map (panel a). The inferrence of the GP map from this population with different regression options is displayed in panels b to f (see the Appendix for the corresponding R script). b: Full model (9 parameters), explains 77.7% of the total phenotypic variance; c) multilinear model (6 parameters, 74.3%); d) no dominance (i.e. only additive and additive-by-additive interactions) (4 parameters, 55.9%); e) no epistasis (5 parameters, 70.8%); f ) additive effects only (3 parameters, 54.9%). The full model always performs better (results identical to the actual GP map except sampling effect). The relative performance of the other models obviously depends on the shape of the actual GP map. If the decomposition is orthogonal, a model selection procedure can be performed to make a rational choice among all possible models.

Similar articles

Cited by

References

    1. Álvarez-Castro JM, Carlborg Ö. A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics. 2007;176:1151–67. - PMC - PubMed
    1. Álvarez-Castro JM, Le Rouzic A, Carlborg Ö. How to perform meaningful estimates of genetic effects. PLoS Genetics. 2008;4(5):e1000062. - PMC - PubMed
    1. Carter AJ, Hermisson J, Hansen TF. The role of epistatic gene interactions in the response to selection and the evolution of evolvability. Theor. Popul. Biol. 2005;68:179–96. - PubMed
    1. Cheverud J, Routman E. Epistasis and its contribution to genetic variance-components. Genetics. 1995;139:1455–61. - PMC - PubMed
    1. Cockerham CC. An extension of the concept of partitionning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954;39:859–82. - PMC - PubMed

LinkOut - more resources