Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 27;9(1):86.
doi: 10.1186/s13073-017-0473-6.

HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

Affiliations

HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

Martin L Buchkovich et al. Genome Med. .

Abstract

Background: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data.

Results: We created HLAProfiler ( https://github.com/ExpressionAnalysis/HLAProfiler ), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms.

Conclusions: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/ .

Keywords: HLA; HSCT; Immunology; RNA-sequencing; Transplantation.

PubMed Disclaimer

Conflict of interest statement

Authors’ information

CCB is currently Senior Scientisit, OmicSoft Corporation, Cary, NC, 27513, USA.

KR is currently Senior Translational Scientist at Renaissance Computing Institute (RENCI), University of North Carolina, Chapel Hill, NC, USA.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

MLB, CCB, KR, SW, and JGP contributed to this manuscript as employees of Q2 Solutions|EA Genomics, which offers genomic services to a variety of clients, including pharmaceutical companies. The submitted work was performed independently of these client relationships. CCB is currently an employee of OmicSoft Corporation, and this work was completed independently of that role. ETW reports personal fees and non-financial support from Illumina, outside the submitted work. BGV and SC declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Overview of the HLAProfiler workflow. The HLAProfiler workflows to create the reference k-mer profile database (green) and HLA calling in RNA-seq data (blue). Each step label in the workflow corresponds to the text (see “Implementation”). The workflows share the k-mer filtering and profile creation step (blue/green box)
Fig. 2
Fig. 2
HLA calling accuracy. a The accuracy of HLA calling was evaluated for six algorithms. Datasets were simulated using GeT-RM alleles from 109 samples (left panels) and rare alleles for 100 samples (right panels) at two-field precision (upper panels) and exact precision (lower panels) when available. b Concordance of HLA calling in 358 lymphoblastoid cell lines compared with gold standard HLA allele calls generated by Sanger sequencing (left panel). Sequences were downsampled to five million reads, HLA alleles were called, and concordance was recalculated (middle panel). Discrepancies between HLAProfiler, OptiType, and Sanger sequencing were resolved using TruSight HLA for 38 samples, the gold standard calls were updated with the resolved genotype, and concordance was recalculated for all methods with the addition of the original Sanger sequencing calls (right panel)
Fig. 3
Fig. 3
HLAProfiler correctly identifies the disease-associated B*27 allele incorrectly called by the gold standard. a Sequence coverage of RNA-seq data from NA11840 when aligned to B*27:03 (gold standard call), B*27:05:02 (identified by RNA-seq algorithms), and B*27:05:03 (full sequence predicted by HLAProfiler with allele refinement, and allele confirmed by TruSight HLA). Exon boundaries relative to the allele and differences between the alleles responsible for dips in coverage are also noted. b HLAProfiler generated comparison statistics of the three alleles, indicating the proportion of observed reads accounted for by the profile, the proportion of the profile accounted for by observed reads, and the correlation between the observed reads and the profile

References

    1. Coico R, Sunshine G. Immunology: a short course. Canada: Wiley; 2015.
    1. Owen J, Punt J, Kuby J, Stranford S. Kuby Immunology. New York: Freeman; 2013.
    1. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 2015;43:D423–31. doi: 10.1093/nar/gku1161. - DOI - PMC - PubMed
    1. Daly AK. Pharmacogenomics of adverse drug reactions. Genome Med. 2013;5:5. doi: 10.1186/gm409. - DOI - PMC - PubMed
    1. Gough SCL, Simmonds MJ. The HLA region and autoimmune disease: associations and mechanisms of action. Curr Genomics. 2007;8:453–65. doi: 10.2174/138920207783591690. - DOI - PMC - PubMed

Publication types