. 2010 Jun 15;26(12):i199-207.

doi: 10.1093/bioinformatics/btq187.

Efficient genome ancestry inference in complex pedigrees with inbreeding

Eric Yi Liu¹, Qi Zhang, Leonard McMillan, Fernando Pardo-Manuel de Villena, Wei Wang

Affiliations

PMID: 20529906
PMCID: PMC2881372
DOI: 10.1093/bioinformatics/btq187

Efficient genome ancestry inference in complex pedigrees with inbreeding

Eric Yi Liu et al. Bioinformatics. 2010.

. 2010 Jun 15;26(12):i199-207.

doi: 10.1093/bioinformatics/btq187.

Authors

Eric Yi Liu¹, Qi Zhang, Leonard McMillan, Fernando Pardo-Manuel de Villena, Wei Wang

Affiliation

¹ Department of Computer Science, University of North Carolina at Chapel Hill, USA.

PMID: 20529906
PMCID: PMC2881372
DOI: 10.1093/bioinformatics/btq187

Abstract

Motivation: High-density SNP data of model animal resources provides opportunities for fine-resolution genetic variation studies. These genetic resources are generated through a variety of breeding schemes that involve multiple generations of matings derived from a set of founder animals. In this article, we investigate the problem of inferring the most probable ancestry of resulting genotypes, given a set of founder genotypes. Due to computational difficulty, existing methods either handle only small pedigree data or disregard the pedigree structure. However, large pedigrees of model animal resources often contain repetitive substructures that can be utilized in accelerating computation.

Results: We present an accurate and efficient method that can accept complex pedigrees with inbreeding in inferring genome ancestry. Inbreeding is a commonly used process in generating genetically diverse and reproducible animals. It is often carried out for many generations and can account for most of the computational complexity in real-world model animal pedigrees. Our method builds a hidden Markov model that derives the ancestry probabilities through inbreeding process without explicit modeling in every generation. The ancestry inference is accurate and fast, independent of the number of generations, for model animal resources such as the Collaborative Cross (CC). Experiments on both simulated and real CC data demonstrate that our method offers comparable accuracy to those methods that build an explicit model of the entire pedigree, but much better scalability with respect to the pedigree size.

PubMed Disclaimer

Figures

**Fig. 1.**
(a) Lattice of binary inheritance indicators representing the inheritance pattern of an inbreeding process at a single site. (b) An equivalent quaternary indicator representation.

**Fig. 2.**
Comparison of predicted probabilities and observed probabilities from 10 000 000 simulations. The data points in the figures are observed probabilities from simulations. The curves are derived from our formulas. (a) Predicted and simulated P_EE0 for θ=0.01, 0.001, 0.0001. (b) Predicted and simulated P_EN1=P_NE1 for θ=0.001, 0.0001. (c) Predicted and simulated P_EE2 for θ=0.001, 0.0001. We do not plot the case of θ=0.01 in (b) and (c) because the values are much larger than that of the other two θ values.

**Fig. 3.**
(a) CC breeding scheme: an example derivation of chromosomes by recombining chromosomes from eight ordered founders. G1 and G2I₀ are two generations of crosses. G2I₁ to G2I_∞ are multiple generations of inbreeding. (b) The inheritance indicators used to represent the inheritance flow at a SNP site.

**Fig. 4.**
(a) Comparison of error rates of GAIN, MERLIN and HAPPY on a simulated dataset with no noise. (b) Comparison on a simulated dataset with 1% noise.

**Fig. 5.**
(a) Proportion of probabilities assigned to wrong ancestry by GAIN and HAPPY on a simulated dataset with no noise. (b) Proportion of probabilities assigned to wrong ancestry by GAIN and HAPPY on a simulated dataset with 1% noise.

**Fig. 6.**
(a) The difference in best ancestry estimated by GAIN and HAPPY. (b) The average JSD between results from GAIN and HAPPY on chromosomes 1 to 19 of 96 real CC mice.

**Fig. 7.**
(a) Ancestry inference on chromosome 7 of a G2I₆ mouse by GAIN. (b) Ancestry inference on chromosome 7 of the same mouse by HAPPY. (c) The pedigree inconsistency in (b), i.e. the aggregated probability assigned to ancestry that violates pedigree knowledge. (d) A region in chromosome 1 from another G2I₆ mouse where propagated error is the main cause of divergence.

**Fig. 8.**
Average running time of the three methods on dataset containing 6644 markers. The experiment is conducted on an Intel desktop with 2.66 Ghz CPU and 8 GB memory.

See this image and copyright information in PMC

References

1. Abecasis GR, et al. MERLIN-rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 2002;30:97–101. - PubMed
1. Browning S, Browning BL. On Reducing the Statespace of Hidden Markov Models for the Identity by Descent Process. Theor. Popul. Biol. 2002;62:1–8. - PubMed
1. Chia R, et al. The origins and uses of mouse outbred stocks. Nat. Genet. 2005;37:1181–1186. - PubMed
1. Churchill GA, et al. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 2002;36:1133–1137. - PubMed
1. Donnelly KP. The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 1983;23:34–63. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

GM076468/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Mouse Genome Informatics (MGI)

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient genome ancestry inference in complex pedigrees with inbreeding

Affiliation

Efficient genome ancestry inference in complex pedigrees with inbreeding

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases