Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Jan 30;19(1):106.
doi: 10.1186/s12864-018-4489-0.

Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data

Affiliations
Comparative Study

Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data

Francisco C Ceballos et al. BMC Genomics. .

Abstract

Background: Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits and diseases. ROH studies have predominantly exploited SNP array data, but are gradually moving to whole genome sequence (WGS) data as it becomes available. WGS data, covering more genetic variability, can add value to ROH studies, but require additional considerations during analysis.

Results: Using SNP array and low coverage WGS data from 1885 individuals from 20 world populations, our aims were to compare ROH from the two datasets and to establish software conditions to get comparable results, thus providing guidelines for combining disparate datasets in joint ROH analyses. By allowing heterozygous SNPs per window, using the PLINK homozygosity function and non-parametric analysis, we were able to obtain non-significant differences in number ROH, mean ROH size and total sum of ROH between data sets using the different technologies for almost all populations.

Conclusions: By allowing 3 heterozygous SNPs per ROH when dealing with WGS low coverage data, it is possible to establish meaningful comparisons between data using SNP array and WGS low coverage technologies.

Keywords: ROH; Runs of Homozygosity; SNP array data; WGS low coverage data.

PubMed Disclaimer

Conflict of interest statement

Ethics approval

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests

Figures

Fig. 1
Fig. 1
Effect of allowing heterozygous SNPs per window evaluated by ep(P,h) as a measure of the empirically observed actually number of heterozygous SNPs found in population P when we allow h heterozygous SNP. (See Materials and methods for the definition)
Fig. 2
Fig. 2
Violin plots of the mean number of ROH longer than 1 Mb. Populations are colored by 5 biogeographical groups by admixture analysis. Admixed (Hispanic-American: CLM, MXL; African-American: ACB, ASW) – blue, Native Americans (PEL) – green, East (CHS, CDX, JPT) and South (KHV) Asia – tan, North (FIN, GBR, CEU) and South (IBS, TSI) Europe – violet, South (ZUL), East (BAG, LWK) and West (YRI) Africa – red. Four distributions per population are shown, array data with 1 heterozygous SNP allowed per window and WGS with 1 to 3 heterozygous SNPs allowed
Fig. 3
Fig. 3
Violin plots of mean ROH size longer than 1 Mb (in Mb). Different biogeographical groups have different x-axis scales in an attempt to maximize the difference between distributions within populations. See Fig. 2 legend for population codes
Fig. 4
Fig. 4
Violin plots of mean total sum of ROH longer than 1 Mb (in Gb). See fig. 2 legend for population codes
Fig. 5
Fig. 5
Heatmaps of correlations and MWW tests of mean number of ROH, mean ROH size and mean total sum of ROH between array data allowing 1 heterozygous SNP per window and WGS data allowing 1 to 5 heterozygous SNPs per window (y-axis). a to c Pearson correlations. d to f P-values of Mann-Whitney-Wilcoxon non-parametrical test (MWW), red shows significant difference between array and WGS while blue shows distributions that cannot be considered different. See Fig. 2 legend for population codes
Fig. 6
Fig. 6
Mean sum of ROH in different length categories. The light colored lines represent WGS with 3 heterozygous SNP allowed per window and dark colored lines represent array data with 1 heterozygous SNP allowed per ROH. See Fig. 2 legend for population codes

References

    1. Broman KW, Weber JL. Long homozygous chromosomal segments in reference families from the centre d'Etude du polymorphisme humain. Am J Hum Genet. 1999;65:1493–1500. doi: 10.1086/302661. - DOI - PMC - PubMed
    1. Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. 2011;189(1):237–249. doi: 10.1534/genetics.111.130922. - DOI - PMC - PubMed
    1. Biraben J-N. An essay concerning mankind's demographic evolution. J Hum Evol. 1980;9:655–663. doi: 10.1016/0047-2484(80)90099-8. - DOI
    1. Donnelly KP. The probability that related individuals share some section of genome identical by descent. Theor Popul Biol. 1983;23(1):34–63. doi: 10.1016/0040-5809(83)90004-7. - DOI - PubMed
    1. Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006;15:789–795. doi: 10.1093/hmg/ddi493. - DOI - PubMed

Publication types

LinkOut - more resources