Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Sep 4:2024.05.13.594005.
doi: 10.1101/2024.05.13.594005.

CORRECTING MODEL MISSPECIFICATION IN RELATIONSHIP ESTIMATES

Affiliations

CORRECTING MODEL MISSPECIFICATION IN RELATIONSHIP ESTIMATES

Ethan M Jewett et al. bioRxiv. .

Abstract

The datasets of large genotyping biobanks and direct-to-consumer genetic testing companies contain many related individuals. Until now, it has been widely accepted that the most distant relationships that can be detected are around fifteen degrees (approximately 8 th cousins) and that practical relationship estimates have a ceiling around ten degrees (approximately 5 th cousins). However, we show that these assumptions are incorrect and that they are due to a misapplication of relationship estimators. In particular, relationship estimators are applied almost exclusively to putative relatives who have been identified because they share detectable tracts of DNA identically by descent (IBD). However, no existing relationship estimator conditions on the event that two individuals share at least one detectable segment of IBD anywhere in the genome. As a result, the relationship estimates obtained using existing estimators are dramatically biased for distant relationships, inferring all sufficiently distant relationships to be around ten degrees regardless of the depth of the true relationship. Existing relationship estimators are derived under a model that assumes that each pair of related individuals shares a single common ancestor (or mating pair of ancestors). This model breaks down for relationships beyond 10 generations in the past because individuals share many thousands of cryptic common ancestors due to pedigree collapse. We first derive a corrected likelihood that conditions on the event that at least one segment is observed between a pair of putative relatives and we demonstrate that the corrected likelihood largely eliminates the bias in estimates of pairwise relationships and provides a more accurate characterization of the uncertainty in these estimates. We then reformulate the relationship inference problem to account for the fact that individuals share many common ancestors, not just one. We demonstrate that the most distant relationship that can be inferred using IBD may be 200 degrees or more, rather than ten, extending the time-to-common ancestor from approximately 300 years in the past to approximately 3,000 years in the past or more. This dramatic increase in the range of relationship estimators makes it possible to infer relationships whose common ancestors lived before historical events such as European settlement of the Americas, the Transatlantic Slave Trade, and the rise and fall of the Roman Empire.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The expected number of distinct common ancestors shared between two present-day people. The simulated values are compared with analytical values obtained using Equation (29) for various population sizes for which simulation is fast.
Figure 2.
Figure 2.
(A) The expected number of distinct detectable-IBD-transmitting common ancestors at each generation in the past. Curves are shown for a minimum segment length of τ=5cM and for three different effective population sizes: N=1,000,N=2,000 and N=5,000 individuals. (B) Same as (A) in log scale.
Figure 3.
Figure 3.
Inferred degree using unconditional and conditional estimators for relationships between 1 and 79 degrees. (A) The unconditional likelihood. (B) The conditional likelihood (Equation 8). (C) The unconditional likelihood together with the prior obtained by normalizing Equation (19) with N=10,000. (D) The conditional likelihood together with the prior obtained by normalizing Equation (19) with N=10,000.
Figure 4.
Figure 4.
The distribution of the total length of IBD. (A) The distribution of the total length of IBD for a=1 ancestors and several values of the number m of meioses separating two putative relatives. Both the unconditional (unc.) and conditional (con.) distributions are shown. (B) The unconditional distribution for values of m in the range m{6,,14}. (C) The conditional distribution for values of m in the range m{6,,14}. (D) Close up of the conditional distribution along with points Ld (black vertical lines) marking transition points where the likelihood surface for d=m-a+1 is greater than the likelihood surface for d=m+1-a+1.
Figure 5.
Figure 5.
Conceptual models for the development of relationship estimators. Panel A shows the conceptual model that underlies existing relationship estimators. In this model, each pair of individuals, i and j (purple dots), shares a single common ancestor, a, or a single mating pair of common ancestors a1,a2 (black circle). Other genealogical ancestors (grey dots) exist in the very distant past, but any IBD these ancestors contribute amounts to background noise. Panel B shows a conceptual model that more accurately describes genealogical relatedness at distant timescales. In this model, each pair of individuals shares many common ancestors in each generation in the past. Some of these ancestors contribute detectable IBD to the pair and some do not.
Figure 6.
Figure 6.
(A) Concordance between Statistic 3 (the degree induced by the most recent detectable-IBD-transmitting common ancestor) and the true degree for relationships in which two people were truly related through a single pair of common ancestors. (B) Concordance between Statistic 3 and the shortest degree among individuals related through multiple common ancestors.

References

    1. Ball C.A., Barber M.J., Byrnes J., Carbonetto P., Chahine K.G., Curtis R.E., Granka J.M., Han E., Hong E.L., Kermany A.R., Myres N.M., Noto K., Qi J., Rand K., Wang Y., and Willmore L.. Rapid forward-in-time simulation at the chromosome and genome level. https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-Wh..., 2016.
    1. Caballero M., Seidman D.N., Qiao Y., Sannerud J., Dyer T.D., Lehman D.M., Curran J.E., Duggirala R., Blangero J., Carmi S., and Williams A.L. Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives. PLoS Genet., 15:e1007979, 2019. - PMC - PubMed
    1. Covo Shai and Elalouf Amir. A novel single-gamma approximation to the sum of independent gamma variables, and a generalization to infinitely divisible distributions. Electronic Journal of Statistics, 8(1):894 – 926, 2014. doi: 10.1214/14-EJS914. URL 10.1214/14-EJS914. - DOI - DOI
    1. David L. T.. Addressing the feasibility of people of african descent finding living african relatives using direct-to-consumer genetic testing. American Journal of Biological Anthropology, 181(2):163–165, 2023. - PMC - PubMed
    1. David L.T.. Supporting the use of genetic genealogy in restoring family narratives following the transatlantic slave trade. Am Anthropol., 126:153–157, 2024. - PMC - PubMed

Publication types