Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr;76(2):109-121.
doi: 10.1007/s00251-024-01333-z. Epub 2024 Feb 24.

Resolving unknown nucleotides in the IPD-IMGT/HLA database by extended and full-length sequencing of HLA class I and II alleles

Affiliations

Resolving unknown nucleotides in the IPD-IMGT/HLA database by extended and full-length sequencing of HLA class I and II alleles

Christina E M Voorter et al. Immunogenetics. 2024 Apr.

Abstract

In the past, identification of HLA alleles was limited to sequencing the region of the gene coding for the peptide binding groove, resulting in a lack of sequence information in the HLA database, challenging HLA allele assignment software programs. We investigated full-length sequences of 19 HLA class I and 7 HLA class II alleles, and we extended another 47 HLA class I alleles with sequences of 5' and 3' UTR regions that were all not yet available in the IPD-IMGT/HLA database. We resolved 8638 unknown nucleotides in the coding sequence of HLA class I and 2139 of HLA class II. Furthermore, with full-length sequencing of the 26 alleles, more than 90 kb of sequence information was added to the non-coding sequences, whereas extension of the 47 alleles resulted in the addition of 5.5 kb unknown nucleotides to the 5' UTR and > 31.7 kb to the 3' UTR region. With this information, some interesting features were observed, like possible recombination events and lineage evolutionary origins. The continuing increase in the availability of full-length sequences in the HLA database will enable the identification of the evolutionary origin and will help the community to improve the alignment and assignment accuracy of HLA alleles.

Keywords: Extended sequences; Full-length sequencing; Group-specific Sanger sequencing; Human leucocyte antigen; NGS.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Phylogenetic tree of HLA-DQB1*03. A multiple sequence alignment was performed on the full-length sequences from position −150 up to 6502 without exon 2 from all DQB1*03 alleles for which this sequence was available in the IPD-IMGT/HLA database (vs. 3.51), excluding null and Q alleles. Since Osoegawa et al. (2022) already identified DQB1*03:06 and 03:25 as DQ4 serotypes, we have excluded these alleles from this phylogenetic tree. Details of the alleles are listed in Supplemental Table 1A. The outlier, DQB1*03:72, was serotyped DQ9 by Osoegawa et al. (2022), but the full-length sequence was completely identical to DQB1*04:02:01:04, except exon 2. The scale bar indicates the length of the tree edges, corresponding to the differences between two allele sequences as calculated by the R SeqinR package (Charif and Lobry 2007)
Fig. 2
Fig. 2
Phylogenetic tree of HLA-DQB1*06. A multiple sequence alignment was performed on the full-length sequences from position −30 up to 6410 from all DQB1*06 alleles for which this sequence was available in the IPD-IMGT/HLA database (vs. 3.51). Details of the alleles are listed in Supplemental Table 1B. The scale bar indicates the length of the tree edges, corresponding to the differences between two allele sequences as calculated by the R SeqinR package (Charif and Lobry 2007)
Fig. 3
Fig. 3
Phylogenetic tree of HLA-B*18. A multiple sequence alignment was performed on the 5′ UTR, intron, and 3′ UTR sequences from position −20 up to 3500 from all B*18 alleles for which these sequences were available in the IPD-IMGT/HLA database (vs. 3.51). Details of the alleles are listed in Supplemental Table 1C. Of the two alleles outside the two clusters, the B*18:03:01:02 has 2 nucleotides identical to B*18:01:01:01 (1127 T, 3358C) and 3 identical to B*18:01:01:02 (2180G, 3014C, 3472 T), and the B*18:01:01:18 allele has 3 nucleotides identical to B*18:01:01:01 (1127 T, 2180A, 3472C) and 2 identical to B*18:01:01:02 (3014C, 3358 T). The scale bar indicates the length of the tree edges, corresponding to the differences between two allele sequences as calculated by the R SeqinR package (Charif and Lobry 2007)

References

    1. Adamek M, Klages C, Bauer M, Kudlek E, Drechsler A, Leuser B, Scherer S, Opelz G, Tran TH. Seven novel HLA alleles reflect different mechanisms involved in the evolution of HLA diversity: description of the new alleles and review of the literature. Hum Immunol. 2015;76:30–35. doi: 10.1016/j.humimm.2014.12.007. - DOI - PubMed
    1. Albrecht V, Zweiniger C, Surendranath V, Lang K, Schöfl G, Dahl A, Winkler S, Lange V, Böhme I, Schmidt AH. Dual redundant sequencing strategy: Full-length genecharacterisation of 1056 novel and confirmatory HLA alleles. HLA. 2017;90:79–87. doi: 10.1111/tan.13057. - DOI - PMC - PubMed
    1. Anholts JD, Aneq M, Dirks HL, Tas A, Verduyn W, Oudshoorn M. Thirty-six novel HLA alleles: 7 HLA-A, 11 HLA-B, 15 HLA-C and 3 HLA-DRB1. Tissue Antigens. 2009;74:424–428. doi: 10.1111/j.1399-0039.2009.01334.x. - DOI - PubMed
    1. Anholts JDH, Kemps-Mols B, Verduijn W, Oudshoorn M, Schreuder GMTh. Three newly identified HLA-B alleles: B*5124, B*5306, B*5307 and confirmation of B*0809 and B*5606. Tissue Antigens. 2001;58:38–41. doi: 10.1034/j.1399-0039.2001.580107.x. - DOI - PubMed
    1. Balas A, Gonzalez-Roiz C, Vargas ML, Garcia-Sanchez F, Vicario JL. Sequencing of the new HLA-B*44:150 allele suggests recombination between B*44:02:01:01 and B*07:02:01 alleles. Tissue Antigens. 2012;80:548–549. doi: 10.1111/tan.12018. - DOI - PubMed