Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(3):e59835.
doi: 10.1371/journal.pone.0059835. Epub 2013 Mar 28.

Clustering and alignment of polymorphic sequences for HLA-DRB1 genotyping

Affiliations

Clustering and alignment of polymorphic sequences for HLA-DRB1 genotyping

Steven Ringquist et al. PLoS One. 2013.

Abstract

Located on Chromosome 6p21, classical human leukocyte antigen genes are highly polymorphic. HLA alleles associate with a variety of phenotypes, such as narcolepsy, autoimmunity, as well as immunologic response to infectious disease. Moreover, high resolution genotyping of these loci is critical to achieving long-term survival of allogeneic transplants. Development of methods to obtain high resolution analysis of HLA genotypes will lead to improved understanding of how select alleles contribute to human health and disease risk. Genomic DNAs were obtained from a cohort of n = 383 subjects recruited as part of an Ulcerative Colitis study and analyzed for HLA-DRB1. HLA genotypes were determined using sequence specific oligonucleotide probes and by next-generation sequencing using the Roche/454 GSFLX instrument. The Clustering and Alignment of Polymorphic Sequences (CAPSeq) software application was developed to analyze next-generation sequencing data. The application generates HLA sequence specific 6-digit genotype information from next-generation sequencing data using MUMmer to align sequences and the R package diffusionMap to classify sequences into their respective allelic groups. The incorporation of Bootstrap Aggregating, Bagging to aid in sorting of sequences into allele classes resulted in improved genotyping accuracy. Using Bagging iterations equal to 60, the genotyping results obtained using CAPSeq when compared with sequence specific oligonucleotide probe characterized 4-digit genotypes exhibited high rates of concordance, matching at 759 out of 766 (99.1%) alleles.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The Clustering and Alignment of Polymorphic Sequences (CAPSeq) software application illustrated as a schematic.
Input Data: Next-generation sequence data formatted as modified FASTQ files consisting of sequences and corresponding Q-scores along with an additional input data file containing known HLA allele sequences. CAPSeq Application: The analysis software can be broken down into 3 principle steps consisting of those developed to align sequences and use corresponding Q-scores to generate a weighted pairwise similarity score (step 1) that can be analyzed via diffusion mapping, followed by K-means clustering to enable the identification of homogeneous sequence groups (step 2) followed by Bootstrap Aggregating, Bagging, of multiple analyses of the data to ensure genotyping precision (step 3). Output Data: The HLA genotyping data is provided as a tab delimited text file containing the most likely allelic match between the CAPSeq generated consensus sequences and list of known HLA alleles.
Figure 2
Figure 2. CAPSeq Bagging iterations result in improved genotyping sensitivity.
Bagging iterations (x-axis) were varied from 5 to 60. The median frequency of the minor sequence that was detectable by CAPSeq (y-axis) is determined from interrogation of the raw sequencing data obtained using the Roche/454 GSFLX instrument.
Figure 3
Figure 3. Frequency of HLA-DRB1 genotypes obtained using SSO (x-axis) and CAPSeq (y-axis) compared at 4-digit resolution.
The dashed line represents the theoretical identity between the two methods. Pearson’s correlation coefficient (r) exceeded 0.999.

Similar articles

Cited by

References

    1. Rossini AA, Greiner DL, Mordes JP (1999) Induction of immunologic tolerance for transplantation. Physiol Rev 79: 99–141. - PubMed
    1. Shiina T, Hosomichi K, Inoko H, Kulski JK (2009) The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 54: 15–39. - PubMed
    1. Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, et al. (2011) The IMGT/HLA database. Nucleic Acids Res 39: D1171–1176. - PMC - PubMed
    1. Becquemont L (2010) HLA: a pharmacogenomics success story. Pharmacogenomics 11: 277–281. - PubMed
    1. Karanes C, Nelson GO, Chitphakdithai P, Agura E, Ballen KK, et al. (2008) Twenty years of unrelated donor hematopoietic cell transplantation for adult recipients facilitated by the National Marrow Donor Program. Biol Blood Marrow Transplant 14: 8–15. - PubMed

Publication types