Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;26(12):i199-207.
doi: 10.1093/bioinformatics/btq187.

Efficient genome ancestry inference in complex pedigrees with inbreeding

Affiliations

Efficient genome ancestry inference in complex pedigrees with inbreeding

Eric Yi Liu et al. Bioinformatics. .

Abstract

Motivation: High-density SNP data of model animal resources provides opportunities for fine-resolution genetic variation studies. These genetic resources are generated through a variety of breeding schemes that involve multiple generations of matings derived from a set of founder animals. In this article, we investigate the problem of inferring the most probable ancestry of resulting genotypes, given a set of founder genotypes. Due to computational difficulty, existing methods either handle only small pedigree data or disregard the pedigree structure. However, large pedigrees of model animal resources often contain repetitive substructures that can be utilized in accelerating computation.

Results: We present an accurate and efficient method that can accept complex pedigrees with inbreeding in inferring genome ancestry. Inbreeding is a commonly used process in generating genetically diverse and reproducible animals. It is often carried out for many generations and can account for most of the computational complexity in real-world model animal pedigrees. Our method builds a hidden Markov model that derives the ancestry probabilities through inbreeding process without explicit modeling in every generation. The ancestry inference is accurate and fast, independent of the number of generations, for model animal resources such as the Collaborative Cross (CC). Experiments on both simulated and real CC data demonstrate that our method offers comparable accuracy to those methods that build an explicit model of the entire pedigree, but much better scalability with respect to the pedigree size.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Lattice of binary inheritance indicators representing the inheritance pattern of an inbreeding process at a single site. (b) An equivalent quaternary indicator representation.
Fig. 2.
Fig. 2.
Comparison of predicted probabilities and observed probabilities from 10 000 000 simulations. The data points in the figures are observed probabilities from simulations. The curves are derived from our formulas. (a) Predicted and simulated PEE0 for θ=0.01, 0.001, 0.0001. (b) Predicted and simulated PEN1=PNE1 for θ=0.001, 0.0001. (c) Predicted and simulated PEE2 for θ=0.001, 0.0001. We do not plot the case of θ=0.01 in (b) and (c) because the values are much larger than that of the other two θ values.
Fig. 3.
Fig. 3.
(a) CC breeding scheme: an example derivation of chromosomes by recombining chromosomes from eight ordered founders. G1 and G2I0 are two generations of crosses. G2I1 to G2I are multiple generations of inbreeding. (b) The inheritance indicators used to represent the inheritance flow at a SNP site.
Fig. 4.
Fig. 4.
(a) Comparison of error rates of GAIN, MERLIN and HAPPY on a simulated dataset with no noise. (b) Comparison on a simulated dataset with 1% noise.
Fig. 5.
Fig. 5.
(a) Proportion of probabilities assigned to wrong ancestry by GAIN and HAPPY on a simulated dataset with no noise. (b) Proportion of probabilities assigned to wrong ancestry by GAIN and HAPPY on a simulated dataset with 1% noise.
Fig. 6.
Fig. 6.
(a) The difference in best ancestry estimated by GAIN and HAPPY. (b) The average JSD between results from GAIN and HAPPY on chromosomes 1 to 19 of 96 real CC mice.
Fig. 7.
Fig. 7.
(a) Ancestry inference on chromosome 7 of a G2I6 mouse by GAIN. (b) Ancestry inference on chromosome 7 of the same mouse by HAPPY. (c) The pedigree inconsistency in (b), i.e. the aggregated probability assigned to ancestry that violates pedigree knowledge. (d) A region in chromosome 1 from another G2I6 mouse where propagated error is the main cause of divergence.
Fig. 8.
Fig. 8.
Average running time of the three methods on dataset containing 6644 markers. The experiment is conducted on an Intel desktop with 2.66 Ghz CPU and 8 GB memory.

Similar articles

Cited by

References

    1. Abecasis GR, et al. MERLIN-rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 2002;30:97–101. - PubMed
    1. Browning S, Browning BL. On Reducing the Statespace of Hidden Markov Models for the Identity by Descent Process. Theor. Popul. Biol. 2002;62:1–8. - PubMed
    1. Chia R, et al. The origins and uses of mouse outbred stocks. Nat. Genet. 2005;37:1181–1186. - PubMed
    1. Churchill GA, et al. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 2002;36:1133–1137. - PubMed
    1. Donnelly KP. The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 1983;23:34–63. - PubMed

Publication types