Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 25:8:411.
doi: 10.1186/1471-2105-8-411.

Estimating genealogies from linked marker data: a Bayesian approach

Affiliations

Estimating genealogies from linked marker data: a Bayesian approach

Dario Gasbarra et al. BMC Bioinformatics. .

Abstract

Background: Answers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.

Results: We present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.

Conclusion: The estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pedigree of the first example. 439 individuals and 10 generations of which the youngest one consisted of the children of 13 nuclear families. Squares denote males, circles denote females. Reprinted from [10].
Figure 2
Figure 2
Haplotyping. The development of the sum of switch distances of the haplotype pairs of the youngest generation over 1,000,000 iterations, both with and without using the recombination model. The line at 318 is the expected value under random haplotype assignment and the line at 77 is the value obtained with PHASE (v.2.1).
Figure 3
Figure 3
Squared errors of relatedness estimates. Boxplots show squared errors of all 741 pairwise relatedness coefficients Rij, where i and j are different individuals from generation 0. The boxes indicate the quartiles (1st, 2nd and 3rd) and the 'whiskers' cover the errors whose distance from the box is less than 1.5 times the box size. The outliers are indicated with single points. Methods used: ours (G), ours without linkage model (G(unlinked)), Lynch and Li's (LL), Lynch and Ritland's (LR) and Wang's (W).
Figure 4
Figure 4
IBD-sharing probabilities Rij(l) for six pairs of individuals from generation 0. These individuals can be found from Figure 1 where the indexes increase from right to left (from 1 to 39). The two leftmost panels illustrate the IBD-sharing profiles of full-siblings, the upper panel in the middle is of a pair of first cousins, and the lower one describes half cousins. The two rightmost panels show the IBD-sharing between the most distant relatives that can be found in the data. The dotted lines are the exact values and the solid lines our estimates.
Figure 5
Figure 5
IBD-sharing among 44 sampled individuals at each marker locus. The statistic Min(S; l) was calculated from the original situation with respect to the 19th generation (original founder level) and the 9th generation and from a reconstruction over 9 generations.
Figure 6
Figure 6
IBS-sharing among 44 sampled individuals at each marker locus. The upper curve illustrates similar statistic as Min(S; l) but calculated from IBS-status. The lower curve displays the difference Min(S; l) - Min(C; l), where C is a control group. No signal of the trait locus between the markers 20 and 21 can be found from these IBS-statistics.
Figure 7
Figure 7
Haplotype sharing among 44 sampled individuals at each marker locus. HSS is calculated for the original situation with respect to two different founder levels (19th and 9th generations) and for the reconstruction (9 generations). The signal in the reconstruction is very weak compared to the true situation.

Similar articles

Cited by

References

    1. Gao G, Hoeschele I, Sorensen P, Du FX. Conditional probability methods for haplotyping in pedigrees. Genetics. 2004;167:2055–2065. doi: 10.1534/genetics.103.021055. - DOI - PMC - PubMed
    1. Lin S, Cutler DJ, Zwick ME, Chakravarti A. Haplotype inference in random population samples. Am J Hum Genet. 2002;71:1129–1137. doi: 10.1086/344347. - DOI - PMC - PubMed
    1. Blouin MS. DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends Ecol Evol. 2003;18:503–511. doi: 10.1016/S0169-5347(03)00225-8. - DOI
    1. Cowell RG, Mostad P. A clustering algorithm using DNA marker information for sub-pedigree reconstruction. J Forensic Sci. 2003;48:1239–1248. - PubMed
    1. Lange EM, Lange K. Powerful allele sharing statistics for nonparametric linkage analysis. Hum Hered. 2004;57:49–58. doi: 10.1159/000077389. - DOI - PubMed

Publication types

Substances

LinkOut - more resources