Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

Monica D Ramstetter¹, Thomas D Dyer², Donna M Lehman³, Joanne E Curran², Ravindranath Duggirala², John Blangero², Jason G Mezey^{4

5}, Amy L Williams¹

Affiliations

¹ Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853 mdr232@cornell.edu alw289@cornell.edu.
² South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520.
³ Department of Medicine, University of Texas Health San Antonio, San Antonio, Texas 78229.
⁴ Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853.
⁵ Department of Genetic Medicine, Weill Cornell Medicine, New York, New York 10065.

PMID: 28739658
PMCID: PMC5586387
DOI: 10.1534/genetics.117.1122

Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

Monica D Ramstetter et al. Genetics. 2017 Sep.

. 2017 Sep;207(1):75-82.

doi: 10.1534/genetics.117.1122. Epub 2017 Jul 24.

Authors

Monica D Ramstetter¹, Thomas D Dyer², Donna M Lehman³, Joanne E Curran², Ravindranath Duggirala², John Blangero², Jason G Mezey^{4

5}, Amy L Williams¹

Affiliations

¹ Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853 mdr232@cornell.edu alw289@cornell.edu.
² South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520.
³ Department of Medicine, University of Texas Health San Antonio, San Antonio, Texas 78229.
⁴ Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853.
⁵ Department of Genetic Medicine, Weill Cornell Medicine, New York, New York 10065.

PMID: 28739658
PMCID: PMC5586387
DOI: 10.1534/genetics.117.1122

Abstract

Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.

Keywords: admixture; identical by descent; relatedness estimation.

PubMed Disclaimer

Figures

**Figure 1**
Performance comparison of the evaluated methods using the SAMAFS data set. Bar plots denote the percentage of sample pairs that are reported to have a given degree of relatedness and that are inferred to be related as the indicated degree. The bar plots are separated on the horizontal axis by the reported relatedness degree and on the vertical axis by inferred relatedness degree. For clarity, the plots list above each bar the inferred percentage that the corresponding bar depicts. Program names listed in red are IBD segment-based methods while those in black use allele frequencies for inference. Red horizontal bars under a bar plot indicate that the corresponding inferences agree with the reported relationships.

See this image and copyright information in PMC

References

1. Abraham K. J., Diaz C., 2014. Identifying large sets of unrelated individuals and unrelated markers. Source Code Biol. Med. 9: 1. - PMC - PubMed
1. Albrechtsen A., Moltke I., Nielsen R., 2010. Natural selection and the distribution of identity-by-descent in the human genome. Genetics 186: 295–308. - PMC - PubMed
1. Alexander D. H., Novembre J., Lange K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19: 1655–1664. - PMC - PubMed
1. Ball C. A., Barber M. J., Byrnes J., Carbonetto P., Chahine K. G., et al. , 2016. Ancestry DNA Matching White Paper. Available at: https://www.ancestry.ca/corporate/sites/default/files/AncestryDNA-Matchi....
1. Browning B. L., Browning S. R., 2011a A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88: 173–182. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

Affiliations

Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources