. 2007 Oct 25:8:411.

doi: 10.1186/1471-2105-8-411.

Estimating genealogies from linked marker data: a Bayesian approach

Dario Gasbarra¹, Matti Pirinen, Mikko J Sillanpää, Elja Arjas

Affiliations

PMID: 17961219
PMCID: PMC2233650
DOI: 10.1186/1471-2105-8-411

Estimating genealogies from linked marker data: a Bayesian approach

Dario Gasbarra et al. BMC Bioinformatics. 2007.

. 2007 Oct 25:8:411.

doi: 10.1186/1471-2105-8-411.

Authors

Dario Gasbarra¹, Matti Pirinen, Mikko J Sillanpää, Elja Arjas

Affiliation

¹ Department of Mathematics and Statistics, University of Helsinki, Finland. dag@rni.helsinki.fi

PMID: 17961219
PMCID: PMC2233650
DOI: 10.1186/1471-2105-8-411

Abstract

Background: Answers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.

Results: We present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.

Conclusion: The estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.

PubMed Disclaimer

Figures

**Figure 1**
**Pedigree of the first example**. 439 individuals and 10 generations of which the youngest one consisted of the children of 13 nuclear families. Squares denote males, circles denote females. Reprinted from [10].

**Figure 2**
**Haplotyping**. The development of the sum of switch distances of the haplotype pairs of the youngest generation over 1,000,000 iterations, both with and without using the recombination model. The line at 318 is the expected value under random haplotype assignment and the line at 77 is the value obtained with PHASE (v.2.1).

**Figure 3**
**Squared errors of relatedness estimates**. Boxplots show squared errors of all 741 pairwise relatedness coefficients R_ij, where i and j are different individuals from generation 0. The boxes indicate the quartiles (1st, 2nd and 3rd) and the 'whiskers' cover the errors whose distance from the box is less than 1.5 times the box size. The outliers are indicated with single points. Methods used: ours (G), ours without linkage model (G(unlinked)), Lynch and Li's (LL), Lynch and Ritland's (LR) and Wang's (W).

**Figure 4**
**IBD-sharing probabilities R_ij(l) for six pairs of individuals from generation 0**. These individuals can be found from Figure 1 where the indexes increase from right to left (from 1 to 39). The two leftmost panels illustrate the IBD-sharing profiles of full-siblings, the upper panel in the middle is of a pair of first cousins, and the lower one describes half cousins. The two rightmost panels show the IBD-sharing between the most distant relatives that can be found in the data. The dotted lines are the exact values and the solid lines our estimates.

**Figure 5**
**IBD-sharing among 44 sampled individuals at each marker locus**. The statistic Min( $S$ ; l) was calculated from the original situation with respect to the 19th generation (original founder level) and the 9th generation and from a reconstruction over 9 generations.

**Figure 6**
**IBS-sharing among 44 sampled individuals at each marker locus**. The upper curve illustrates similar statistic as Min( $S$ ; l) but calculated from IBS-status. The lower curve displays the difference Min( $S$ ; l) - Min( $C$ ; l), where $C$ is a control group. No signal of the trait locus between the markers 20 and 21 can be found from these IBS-statistics.

**Figure 7**
**Haplotype sharing among 44 sampled individuals at each marker locus**. HSS is calculated for the original situation with respect to two different founder levels (19th and 9th generations) and for the reconstruction (9 generations). The signal in the reconstruction is very weak compared to the true situation.

See this image and copyright information in PMC

Cited by

Estimation of genealogical coancestry in plant species using a pedigree reconstruction algorithm and application to an oil palm breeding population.
Cros D, Sánchez L, Cochard B, Samper P, Denis M, Bouvet JM, Fernández J. Cros D, et al. Theor Appl Genet. 2014 Apr;127(4):981-94. doi: 10.1007/s00122-014-2273-3. Epub 2014 Feb 7. Theor Appl Genet. 2014. PMID: 24504554
Bayesian quantitative trait locus mapping based on reconstruction of recent genetic histories.
Gasbarra D, Pirinen M, Sillanpää MJ, Arjas E. Gasbarra D, et al. Genetics. 2009 Oct;183(2):709-21. doi: 10.1534/genetics.109.104190. Epub 2009 Jul 20. Genetics. 2009. PMID: 19620396 Free PMC article.
Bayesian inference of local trees along chromosomes by the sequential Markov coalescent.
Zheng C, Kuhner MK, Thompson EA. Zheng C, et al. J Mol Evol. 2014 May;78(5):279-92. doi: 10.1007/s00239-014-9620-5. Epub 2014 May 11. J Mol Evol. 2014. PMID: 24817610 Free PMC article.

References

1. Gao G, Hoeschele I, Sorensen P, Du FX. Conditional probability methods for haplotyping in pedigrees. Genetics. 2004;167:2055–2065. doi: 10.1534/genetics.103.021055. - DOI - PMC - PubMed
1. Lin S, Cutler DJ, Zwick ME, Chakravarti A. Haplotype inference in random population samples. Am J Hum Genet. 2002;71:1129–1137. doi: 10.1086/344347. - DOI - PMC - PubMed
1. Blouin MS. DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends Ecol Evol. 2003;18:503–511. doi: 10.1016/S0169-5347(03)00225-8. - DOI
1. Cowell RG, Mostad P. A clustering algorithm using DNA marker information for sub-pedigree reconstruction. J Forensic Sci. 2003;48:1239–1248. - PubMed
1. Lange EM, Lange K. Powerful allele sharing statistics for nonparametric linkage analysis. Hum Hered. 2004;57:49–58. doi: 10.1159/000077389. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating genealogies from linked marker data: a Bayesian approach

Affiliation

Estimating genealogies from linked marker data: a Bayesian approach

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources