Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 15;26(22):2867-73.
doi: 10.1093/bioinformatics/btq559. Epub 2010 Oct 5.

Robust relationship inference in genome-wide association studies

Affiliations

Robust relationship inference in genome-wide association studies

Ani Manichaikul et al. Bioinformatics. .

Abstract

Motivation: Genome-wide association studies (GWASs) have been widely used to map loci contributing to variation in complex traits and risk of diseases in humans. Accurate specification of familial relationships is crucial for family-based GWAS, as well as in population-based GWAS with unknown (or unrecognized) family structure. The family structure in a GWAS should be routinely investigated using the SNP data prior to the analysis of population structure or phenotype. Existing algorithms for relationship inference have a major weakness of estimating allele frequencies at each SNP from the entire sample, under a strong assumption of homogeneous population structure. This assumption is often untenable.

Results: Here, we present a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). We present simulation experiments to demonstrate that the algorithm has sufficient power to provide reliable inference on millions of unrelated pairs and thousands of relative pairs (up to 3rd-degree relationships). Application of our robust algorithm to HapMap and GWAS datasets demonstrates that it performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. Our extremely efficient implementation performs relationship inference on millions of pairs of individuals in a matter of minutes, dozens of times faster than the most efficient existing algorithm known to us.

Availability: Our robust relationship inference algorithm is implemented in a freely available software package, KING, available for download at http://people.virginia.edu/∼wc9c/KING.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Distribution of kinship coefficient estimation. (A) Distribution of realized IBD-sharing with 150k SNPs; (B) distribution of kinship coefficient estimates with 150k SNPs; (C) distribution of kinship coefficient estimates with 5k SNPs; and (D) distribution of kinship coefficient estimates with 500K SNPs.
Fig. 2.
Fig. 2.
Relationship checking in 269 HapMap samples (A), (C) and (E) are within-family relationship checking using three algorithms, and (B), (D) and (F) are between-family relationship checking using three algorithms. Negative kinship coefficient estimates are truncated to 0. Dashed lines indicate inference criteria as shown in Table 1. Solid lines follow the equation ϕ = (1 − π0)/4 that holds true for all relationships shown in Table 1, except for full sibs.
Fig. 3.
Fig. 3.
Population structure in 269 HapMap samples. (A) Robust estimator of kinship coefficient as a tool for population structure discovery. Colored dots represent comparison of individuals from distinct populations. Within-population comparisons are shown in black; (B) mean and variance of allele frequencies at each individual; and (C) and (D) top four principal components from PCA.
Fig. 4.
Fig. 4.
Relationship checking in OM GWAS data. (A), (C) and (E) are within-family relationship checking using three algorithms, and (B), (D) and (F) are between-family relationship checking using three algorithms. Negative kinship coefficient estimates are truncated to 0.

References

    1. Abecasis GR, et al. GRR: graphical representation of relationship errors. Bioinformatics. 2001;17:742–743. - PubMed
    1. Abecasis GR, et al. Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 2002;30:97–101. - PubMed
    1. Boehnke M, Cox NJ. Accurate inference of relationships in sib-pair linkage studies. Am. J. Hum. Genet. 1997;61:423–429. - PMC - PubMed
    1. Chen WM, Abecasis GR. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 2007;81:913–926. - PMC - PubMed
    1. Chen WM, Deng HW. A general and accurate approach for computing the statistical power of the transmission disequilibrium test for complex disease genes. Genet. Epidemiol. 2001;21:53–67. - PubMed

Publication types