Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 17;15(1):478.
doi: 10.1186/1471-2164-15-478.

A new approach for efficient genotype imputation using information from relatives

Affiliations

A new approach for efficient genotype imputation using information from relatives

Mehdi Sargolzaei et al. BMC Genomics. .

Abstract

Background: Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.

Results: The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.

Conclusions: The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overall allelic r2 for FImpute, Beagle and Impute2 across different imputation scenarios. There were 2000 and 500 young target individuals for imputation from 3 k/6 k to 50 k and from 50 k to 300 k, respectively. In scenarios A and F, reference groups with different sizes were randomly chosen after excluding parents and grandparents. The reference group in scenarios B and C included only parents and grandparents, in scenarios D and E it included all genotyped males and in scenarios G and H it included all genotyped individuals. Pedigree information was considered in scenarios C, E and H and was disregarded in scenarios B, C and G.
Figure 2
Figure 2
Rare allele imputation: allelic r2 in different MAF bins for FImpute, Beagle and Impute2. There were 2000 and 500 young target individuals for imputation from 3 k/6 k to 50 k and from 50 k to 300 k, respectively. In scenarios A and F, reference groups with different sizes were randomly chosen after excluding parents and grandparents. The reference group in scenarios B and C included only parents and grandparents, in scenarios D and E it included all genotyped males and in scenarios G and H it included all genotyped individuals. Pedigree information was considered in scenarios C, E and H and was discarded in scenarios B, C and G.
Figure 3
Figure 3
CPU time for Beagle, Impute2 and FImpute over different reference sizes. No pedigree information was used and genotyped parents and grandparents were excluded.
Figure 4
Figure 4
Tracing genotyped individuals for family imputation. A1, … A12 represent ancestors of individual P, and S and D are its sire and dam. An asterisk indicates that the individual is genotyped. The dotted line shows the traced path for animal P.

References

    1. Nejati-Javaremi A, Smith C, Gibson JP. Effect of total allelic relationship on accuracy of evaluation and response to selection. J Anim Sci. 1997;75:1738–1745. - PubMed
    1. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. - PMC - PubMed
    1. Schaeffer LR. Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet. 2006;123:1–6. doi: 10.1111/j.1439-0388.2006.00595.x. - DOI - PubMed
    1. Van der Werf JHJ. Potential benefit of genomic selection in sheep. Proc Assoc Advanc Anim Genetics. 2009;18:38–41.
    1. Hayes BJ, Bowman PJ, Daetwyler HD, Kijas JW, van der Werf JHJ. Accuracy of genotype imputation in sheep breeds. Anim Genet. 2011;43:72–80. doi: 10.1111/j.1365-2052.2011.02208.x. - DOI - PubMed