Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 6;230(4):iyaf094.
doi: 10.1093/genetics/iyaf094.

A rapid accurate approach to inferring pedigrees in endogamous populations

Affiliations

A rapid accurate approach to inferring pedigrees in endogamous populations

Cole M Williams et al. Genetics. .

Abstract

Accurate reconstruction of pedigrees from genetic data remains a challenging problem. Many relationship categories (e.g. half-sibships vs avuncular) can be difficult to distinguish without external information. Pedigree inference algorithms are often trained on European-descent families in urban locations. Thus, existing methods tend to perform poorly in endogamous populations for which there may be reticulations within the pedigrees and elevated haplotype sharing. We present a simple, rapid algorithm which initially uses only high-confidence first-degree relationships to seed a machine learning step based on summary statistics of identity-by-descent sharing. One of these statistics, our "haplotype score," is novel and can be used to: (1) distinguish half-sibling pairs from avuncular or grandparent-grandchildren pairs; and (2) assign individuals to ancestor vs descendant generation. We test our approach in a sample of 700 individuals from northern Namibia, sampled from an endogamous population called the Himba. Due to a culture of concurrent relationships in the Himba, there is a high proportion of half-sibships. We accurately identify first through fourth-degree relationships and distinguish between various second-degree relationships: half-sibships, avuncular pairs, and grandparent-grandchildren. We further validate our approach in a second African-descent dataset, the Barbados Asthma Genetics Study, and a European-descent founder population from Quebec. Accurate reconstruction of relatives facilitates estimation of allele frequencies, tracing allele trajectories, improved phasing, heritability and other population genomic questions.

Keywords: genealogy; identity-by-descent; pedigree.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: The author(s) declare no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
An overview of PONDEROSA. The minimum input data are (1) an IBD segment file that contains pairwise IBD segments, including the start and end genetic coordinate and the haplotype indices corresponding to the two individuals sharing the segment and (2) a PLINK-formatted FAM file containing parent–offspring information. In Step 1, the IBD segments are processed for each pair of relatives, giving five different IBD statistics for each pair. We provide an example of the IBD sharing statistics on a single chromosome for the relative pair (A, B) who are a grandparent–grandchild pair. In Step 2, PONDEROSA builds a parent–offspring graph, a directed graph in which edges direct parents to offspring. PONDEROSA finds paths through the graph, as outlined in Methods. A path is a vector of [+1, 1], describing the movement from node to node (+1 describes moving from offspring to parent and 1 describes moving from parent to offspring). The pair (A, B), for instance, are connected by the path [+1, +1], which is the path for a grandparent–grandchild pair. In Step 3, we merge the IBD sharing statistics with the relationships found in Step 2. Pairs with a known relationship (i.e. the first three rows in Step 3) are used to train three LDA classifiers (Step 4).
Fig. 2.
Fig. 2.
a) The three classifiers used by PONDEROSA. The star on each plot indicates where an example pair (a maternal grandparent–grandchild) falls on the classifier. The degree classifier predicts the degree of relatedness of the pair based on the proportion of the genome shared IBD1 and IBD2 (π^1 and π^2, respectively). The plot is colored by the most likely degree of relatedness, but the classifier also outputs class probabilities. The haplotype score classifier predicts whether a second-degree pair is a half-sibling pair or a grandparent–grandchild/avuncular (GPAV); there is also a third possible classification called “Phase error,” which indicates that a pair’s phase error is too high to reliably distinguish half-siblings from GPAV. The n classifier takes as input the number of IBD segments and total proportion of the genome covered by IBD. Here, we are showing the probability of avuncular and grandparent–grandchild as a function of the number of IBD segments shared (probabilities of other relationships not shown). b) An example of the relationship likelihood tree, which is a directed graph, for the same example pair in a) is shown. Each node is a relationship and child nodes are more specific descriptors of the parent node, e.g. GPAV is the parent node of grandparent–grandchild (GP) and avuncular (AV). Other nodes include PHS (paternal half-siblings), MHS (maternal half-siblings), PGP (paternal grandparent–grandchild), and MGP (maternal grandparent–grandchild). Each node has a conditional probability, that is the probability of the relationship given that its parent relationship P(child | parent), and a posterior probability P(child). The most likely relationship is a maternal grandparent–grandchild, but if we wanted to be >95% confident, PONDEROSA would output the most likely relationship as nonsex specific grandparent–grandchild. If we had other information that led us to believe the pair was actually a half-sibling pair, we could look at the conditional probabilities and find that the pair is most likely a paternal half-sibling pair.
Fig. 3.
Fig. 3.
Benchmarking PONDEROSA, ERSA, IBIS, and KING. The bar plots indicate what proportion of actual relatives (columns) are inferred as the row-indicated relationship. For example, KING infers 67.8% of fourth-degree relatives as third-degree relatives, whereas PONDEROSA infers 10.3% as such. The diagonal indicates the proportion of pairs classified correctly and is equivalent to the sensitivity.
Fig. 4.
Fig. 4.
Comparison of ERSA, CREST, and PONDEROSA assignments of Himba second-degree relatives. Sex-specific relationships are aggregated, e.g. half-siblings includes both maternal and paternal half-siblings. PONDEROSA outperforms ERSA and CREST in all relationship categories: half-siblings (HS), avuncular (AV), and grandparent–grandchildren (GP).
Fig. 5.
Fig. 5.
PONDEROSA, CREST, and ERSA assignments of the BAGS cohort second-degree relatives. PONDEROSA has lower performance in the BAGS cohort compared to its performance in the Himba, but considerably outperforms ERSA and CREST.
Fig. 6.
Fig. 6.
PONDEROSA’s predicted degree of relatedness when modeling Type III relationships. For each dataset, we simulated Types I, II, and III relationships using genotype data as input. The Type III relationships were double half-cousins (4th+), first-cousin/half-cousin (3rd+), and half-sibling/first-cousin (2nd+). These simulated relatives and their IBD segments were used as input training data for PONDEROSA. Here, we show PONDEROSA’s predicted degree of relatedness for each dataset’s relative pairs, plotted by their IBD1 and IBD2 proportions. Each population has several Type III relationships predicted by PONDEROSA, particularly in the Himba and synthetic Quebec datasets.

Similar articles

  • Prescription of Controlled Substances: Benefits and Risks.
    Preuss CV, Kalava A, King KC. Preuss CV, et al. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
  • The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.
    Davenport C, Arevalo-Rodriguez I, Mateos-Haro M, Berhane S, Dinnes J, Spijker R, Buitrago-Garcia D, Ciapponi A, Takwoingi Y, Deeks JJ, Emperador D, Leeflang MMG, Van den Bruel A; Cochrane COVID-19 Diagnostic Test Accuracy Group. Davenport C, et al. Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780. Cochrane Database Syst Rev. 2024. PMID: 39679851 Free PMC article.
  • Sexual Harassment and Prevention Training.
    Cedeno R, Bohlen J. Cedeno R, et al. 2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 36508513 Free Books & Documents.
  • Short-Term Memory Impairment.
    Cascella M, Al Khalili Y. Cascella M, et al. 2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 31424720 Free Books & Documents.
  • Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
    Struyf T, Deeks JJ, Dinnes J, Takwoingi Y, Davenport C, Leeflang MM, Spijker R, Hooft L, Emperador D, Domen J, Tans A, Janssens S, Wickramasinghe D, Lannoy V, Horn SRA, Van den Bruel A; Cochrane COVID-19 Diagnostic Test Accuracy Group. Struyf T, et al. Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.

Cited by

References

    1. Anderson-Trocmé L, Nelson D, Zabad S, Diaz-Papkovich A, Kryukov I, Baya N, Touvier M, Jeffery B, Dina C, Vézina H, et al. 2023. On the genes, genealogies, and geographies of Quebec. Science. 380(6647):849–855. doi: 10.1126/science.add5300. - DOI - PubMed
    1. Barnes KC, Neely JD, Duffy DL, Freidhoff LR, Breazeale DR, Schou C, Naidu RP, Levett PN, Renault B, Kucherlapati R, et al. 1996. Linkage of asthma and total serum IgE concentration to markers on chromosome 12q: evidence from afro-caribbean and Caucasian populations. Genomics. 37(1):41–50. doi: 10.1006/geno.1996.0518. - DOI - PubMed
    1. Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, et al. 2022. Efficient ancestry and mutation simulation with msprime 1.0. Genetics. 220(3):iyab229. doi: 10.1093/genetics/iyab229. - DOI - PMC - PubMed
    1. Bittles AH, Black ML. 2010. Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci U S A. 107(suppl_1):1779–1786. doi: 10.1073/pnas.0906079106. - DOI - PMC - PubMed
    1. Bollig M. 1997. Risk and risk minimisation among Himba pastoralists in northwestern Namibia. Nomad People. 1(1):66–89. doi: 10.3167/082279497782384758. - DOI

LinkOut - more resources