Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec;98(6):422-30.
doi: 10.1016/j.ygeno.2011.08.007. Epub 2011 Aug 28.

Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm

Affiliations

Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm

Thomas J Hoffmann et al. Genomics. 2011 Dec.

Abstract

Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Chromosome 21 coverage of the African Ancestry in Southwest USA (ASW) population based on two hypothetical arrays, one designed by pairwise tagging and the other by hybrid SNP selection for the Yoruba in Ibadan (YRI) population. Coverage was based on imputation using the YRI population as reference. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range.
Fig. 2
Fig. 2
Chromosome 21 coverage of the Luhya in Webuye, Kenya (LWK) population based on two hypothetical arrays, one designed by pairwise tagging and the other by hybrid SNP selection for the Yoruba in Ibadan (YRI) population. Coverage was based on imputation using the YRI population as reference. The numbers in parentheses are the numbers of markers in the target set in each particular minor allele frequency range.
Fig. 3
Fig. 3
Chromosome 2 coverage by the new AX_KP_UCSF_EAS array of the 1000 Genomes interim June 2011 (KG2011) Han Chinese in Beijing (CHB) genotypes. Coverage was based on imputation of the target CHB set using all other individuals except CHB. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range. Subsets refer to SNPs identified in the 1000 Genomes High Pass (KGHP) sequencing (indicated by solid lines with “KGLP ∩ KGHP”) versus all SNPs (indicated by dashed lines with “KGLP, All SNPs”).
Fig. 4
Fig. 4
Chromosome 2 coverage by the new AX_KP_UCSF_AFR array of the 1000 Genomes interim June 2011 release (KG2011) African Ancestry in Southwest USA (ASW) genotypes. Coverage was based on imputation of the target ASW set using all other individuals except ASW. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range. Subsets refer to SNPs identified in the 1000 Genomes High Pass (KGHP) sequencing (indicated by solid lines with “KGLP ∩ KGHP”) versus all SNPs (indicated by dashed lines with “KGLP, All SNPs”).
Fig. 5
Fig. 5
Chromosome 2 coverage by the new AX_KP_UCSF_LAT array of the 1000 Genomes interim June 2011 release (KG2011) Mexicans in Los Angeles, CA (MXL) genotypes. Coverage was based on imputation of the target MXL set using all other individuals except MXL. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range. Subsets refer to SNPs identified in the 1000 Genomes High Pass (KGHP) sequencing (indicated by solid lines with “KGLP ∩ KGHP”) versus all SNPs (indicated by dashed lines with “KGLP, All SNPs”).
Fig. 6
Fig. 6
Chromosome 2 coverage by the new AX_KP_UCSF_LAT array of the 1000 Genomes interim June 2011 release (KG2011) Puerto Rican in Puerto Rico (PUR) genotypes. Coverage was based on imputation of the target PUR set using all other individuals except PUR. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range. Subsets refer to SNPs identified in the 1000 Genomes High Pass (KGHP) sequencing (indicated by solid lines with “KGLP ∩ KGHP”) versus all SNPs (indicated by dashed lines with “KGLP, All SNPs”).
Fig. 7
Fig. 7
Chromosome 2 coverage by the AX_KP_UCSF_EUR array of the 1000 Genomes interim June 2011 release (KG2011) Utah residents with ancestry from Northern and Western Europe (CEU) genotypes. Coverage was based on imputation of the target CEU set using all other individuals except CEU. The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range. Subsets refer to SNPs identified in the 1000 Genomes High Pass (KGHP) sequencing (indicated by solid lines with “KGLP ∩ KGHP”) versus all SNPs (indicated by dashed lines with “KGLP, All SNPs”).

References

    1. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. - PubMed
    1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 2008;118:1590–1605. - PMC - PubMed
    1. Witte JS. Genome-wide association studies and beyond. Annu. Rev. Public Health. 2010;31:9–20. - PMC - PubMed
    1. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. - PMC - PubMed

Publication types