Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 1;15(1):325.
doi: 10.1186/1471-2164-15-325.

Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

Affiliations

Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

Yu Bai et al. BMC Genomics. .

Abstract

Background: Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data.

Results: We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq.

Conclusions: PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PHLAT algorithm workflow. The algorithm consists of read mapping via Bowtie 2 to a reference sequence comprising the human genome and a plurality of genomic sequences of HLA alleles (I), selection of candidate alleles based on the number of mapped reads (II-IV), and log-likelihood scoring (V) over every pair of selected candidate alleles (e.g. a pair of a and b alleles). The pair of alleles with the best likelihood score is reported as the inferred HLA type at a given locus.
Figure 2
Figure 2
Analysis of frequently mistyped alleles. (A) The histograms illustrate the type (x-axis) and the number (y-axis) of the misidentified alleles at the HLA-DQA1 (left panel) and HLA-DQB1 (right panel) loci, summarized over the HapMap RNAseq, the 1000 Genome WXS and the HapMap WXS datasets. (B) Visualization of the mapped reads in one representative sample (subject NA12156, Additional file 1: Table S1) where the HLA-DQA1*03:01 allele is mistyped as the HLA-DQA1*03:03 allele. The mapped reads are shown around the single SNP position (chr6: 32609965, highlighted in between two vertical dashed lines) that distinguishes the two alleles. The hg19 reference sequence of the HLA-DQA1 gene is shown at the bottom of the panel. The nucleotide bases A, C, G, T are colored in green, red, blue grey and blue, respectively. The bases in the reads, if different from the reference sequence at the aligned positions, are visualized in the same color code. The pileup counts of the A, C, G, T bases at the highlighted SNP are 141, 117, 0 and 0, respectively. (C) The alignment of a 135-nucleotide segment from the HLA-DQA1*03:03 allele, noted as the query, with the HLA-DQA2 reference sequence in human genome hg19. The query sequence is simplified as a horizontal bar with only the mismatches indicated. The existing dbSNP record at the mismatch is labeled with a red vertical marker and the associated identification numbers (e.g. rs62619945) followed by a parenthesis indicating the major and the alternative base sequences. The alignment of the SNP that differ the DQA1*03:01 and DQA1*03:03 alleles is boxed.
Figure 3
Figure 3
Impact of read length, coverage and sequencing protocols on HLA typing accuracy. The plot summarizes the HLA typing accuracy of PHLAT using samples from the HapMap RNAseq (top panel), the 1000 Genome WXS (middle panel) and the HapMap WXS (bottom panel) datasets. Prediction accuracies are calculated by considering the sequencing data as either paired-end (close symbols and solid lines) or single-end (open symbols and dashed lines). The symbols represent the mean accuracy at four-digit resolution of the samples that are binned by their fold coverage at the HLA loci, with the error bars indicating the variance. The post-mapping fold coverage is calculated regarding to the CDS regions of the major class I and II HLA loci, excluding the reads suboptimal or not aligned to the candidate alleles. The smooth lines by spline interpolation illustrate the trend of the symbols.

Similar articles

Cited by

References

    1. Choo SY. The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei Med J. 2007;48(1):11–23. doi: 10.3349/ymj.2007.48.1.11. - DOI - PMC - PubMed
    1. Sullivan LC, Clements CS, Rossjohn J, Brooks AG. The major histocompatibility complex class Ib molecule HLA-E at the interface between innate and adaptive immunity. Tissue Antigens. 2008;72(5):415–424. doi: 10.1111/j.1399-0039.2008.01138.x. - DOI - PubMed
    1. Algarra I, Cabrera T, Garrido F. The HLA crossroad in tumor immunology. Hum Immunol. 2000;61(1):65–73. doi: 10.1016/S0198-8859(99)00156-1. - DOI - PubMed
    1. Park M, Seo JJ. Role of HLA in Hematopoietic Stem Cell Transplantation. Bone Marrow Res. 2012;2012:680841. doi: 10.1155/2012/680841. - DOI - PMC - PubMed
    1. Eng HS, Leffell MS. Histocompatibility testing after fifty years of transplantation. J Immunol Methods. 2011;369(1–2):1–21. doi: 10.1016/j.jim.2011.04.005. - DOI - PubMed