Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Feb;11(2):149-60.
doi: 10.1038/nrg2731.

Methodological challenges of genome-wide association analysis in Africa

Affiliations
Review

Methodological challenges of genome-wide association analysis in Africa

Yik-Ying Teo et al. Nat Rev Genet. 2010 Feb.

Abstract

Medical research in Africa has yet to benefit from the advent of genome-wide association (GWA) analysis, partly because the genotyping tools and statistical methods that have been developed for European and Asian populations struggle to deal with the high levels of genome diversity and population structure in Africa. However, the haplotypic diversity of African populations might help to overcome one of the major roadblocks in GWA research, the fine mapping of causal variants. We review the methodological challenges and consider how GWA studies in Africa will be transformed by new approaches in statistical imputation and large-scale genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1. African populations are subject to high levels of ascertainment bias in current SNP databases
A study by Wall et al. [76] sequenced 40 intergenic regions in 90 individuals from 6 different ethnic groups. Within these regions, they observed almost all of the SNPs in the HapMap Phase 2 database, as well as discovering many new SNPs. The figure shows the number of SNPs in the HapMap data (green) compared with the number of SNPs that were discovered by resequencing and that were not present in the HapMap data (orange), categorized by derived allele frequency. a | Data from all ethnic groups combined. b | SNPs discovered in an African group (Mandinka) compared with African data (Yoruba people in Ibadan, Nigeria (YRI)) from the HapMap Project. c | SNPs discovered in a European group (Basque) compared with European data (Utah residents with Northern and Western European ancestry from the CEPH collection (CEU)) from the HapMap Project. d | SNPs discovered in an East Asian group (Han Chinese) compared with SNPs from a similar group (Han Chinese in Beijing (CHB)) in the HapMap Project. It can be seen that the HapMap data have greater SNP ascertainment bias for African than for European or Asian populations. In particular, African populations have many low-frequency alleles that are not well represented in current SNP databases. The figure is modified, with permission, from Ref. [76] © (2008) CSHL Press.
Figure 2
Figure 2. Meta-analysis at a site with different associated haplotypes in two populations
The ‘sickle cell’ variant of the haemoglobin-β (HBB) gene – encoding haemoglobin S (HbS) – is known to confer resistance to severe malaria. It is also known to exist on different haplotypes in different African populations. Here, we consider the major HbS haplotypes (green and blue horizontal bars) found in Gambia and in the Yoruba people of Nigeria: the HbS-encoding variant (orange strip) is in linkage disequilibrium with different SNPs (cyan strips) in the two populations. The graphs represent fictitious case–control studies of severe malaria in the Gambian (a) and Yoruban (b) populations, showing the strength of association signal expected from the causal variant (orange star) and other SNPs (red circles). Part c shows the results expected if data from a and b were combined in a standard meta-analysis: the association signal of the causal variant is boosted, but that of other SNPs is reduced.
Figure 3
Figure 3. Imputation and the choice of haplotype reference panel
Imputation is a process of statistical inference that estimates the most likely genotype of an individual at a given position in the genome, based on what is known about the genotype of that individual at nearby positions and on a reference data set of genome variation in the general population. The accuracy of imputation depends on the appropriateness of the reference data set. The figure shows signals of association with severe malaria from SNPs distributed across a ~2.5-Mb region of chromosome 11 (Ref. 19). The vertical dashed line represents the position of rs334: this SNP is known to encode the haemoglobin S (HbS) variant of the haemoglobin-β (HBB) gene, which confers resistance to malaria. a | SNPs typed using the Affymetrix 500K genotyping platform (black circles). b | SNPs imputed using the HapMap Yoruba people in Ibadan, Nigeria (YRI) data as the reference (grey circles). The rs334 SNP is shown as a yellow diamond. c | SNPs imputed from regional sequencing data on 62 Gambian individuals (orange circles), including rs334 (yellow diamond). If we did not know that rs334 was the causal variant, imputation based on Gambian sequencing data would have been extremely useful, whereas imputation based on the HapMap YRI data would have been misleading. Parts a and c are modified, with permission, from Nature Genetics (Ref. 19) © (2009) Macmillan Publishers Ltd. All rights reserved.

Similar articles

Cited by

References

    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 2008;118:1590–1605. - PMC - PubMed
    1. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 2008;9:356–369. - PubMed
    1. Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–731. - PubMed
    1. Black RE, Morris SS, Bryce J. Where and why are 10 million children dying every year? Lancet. 2003;361:2226–2234. - PubMed
    1. Mathers CD, Boerma T, Ma Fat D. Global and regional causes of death. Br. Med. Bull. 2009;92:7–32. - PubMed

Publication types