Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22;53(10):gkaf441.
doi: 10.1093/nar/gkaf441.

Haplotypic resolution of the challenging genomic regions of MHC and KIR using a combination of targeted sequencing and a novel assembly pipeline

Affiliations

Haplotypic resolution of the challenging genomic regions of MHC and KIR using a combination of targeted sequencing and a novel assembly pipeline

Timothy L Mosbruger et al. Nucleic Acids Res. .

Abstract

Recently long-read sequencing technologies and bioinformatics have enabled the construction of haplotype-resolved genome assemblies. Here, we present the complete and accurate de novo characterization of two challenging genomic regions, the major histocompatibility complex (MHC) and Killer-cell immunoglobulin-like receptors (KIRs), in phased haplotypic form, using the Oxford Nanopore Technology (ONT) Adaptive Sampling sequencing, and a newly developed bioinformatics pipeline. These critical regions for our immune response have been notoriously difficult to characterize due to their sequence variability and structural complexity. The key features of our approach are (i) focused sequencing of specific regions, (ii) exclusive use of ONT, and (iii) a unique phasing methodology that integrates sequencing reads, methylation signals, and a reference panel. Ten samples with known MHC and KIR haplotypes were sequenced and assembled, demonstrating the potential of our approach. We achieved efficient target enrichment resulting in 100% coverage and accuracy ranging from 99.95% to 99.99% across the MHC and KIR. Its simplicity, reproducibility, and affordability distinguish this method as a unique and effective approach for the targeted haplotypic characterization of the MHC and KIR without trios and possibly other specific genomic regions. These efforts will in turn facilitate future studies that further advance the functional deconvolution of our genome.

PubMed Disclaimer

Conflict of interest statement

DSM is a consultant to Omixon/Werfen. D.S.M., J.L.D., D.F., T.L.M. and A.D. receive royalties from Omixon/Werfen. No other authors declare that they have any competing interests.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
MHC and KIR Assembly Pipeline Overview. AS is in yellow, read manipulation steps are in red, assembly steps in blue, and haplotype scaffolding steps in green. Briefly, MHC- and KIR-specific reads are enriched using the ONT AS method. Region-specific reads are then identified by aligning to hg38 and a panel of MHC and KIR haplotypes, retaining long reads (≥10kb) that align to the region of interest. (A) MHC-specific reads are corrected in a haplotype-specific manner with HERRO and then assembled using Canu. Meanwhile, a phased haplotype scaffold is created by combining phased hg38-based SNV and methylation calls with the contig phasing information. If phase breaks remain after ONT phasing, they are resolved using haplotype estimation with SHAPEIT4 with 1K genome 30× SNV calls as the reference panel. The haplotype scaffold and initial assembly are used to partition corrected reads into two haplotypes which can then be assembled separately with Canu. (B) KIR-specific reads are corrected in a haplotype-aware manner with HERRO, trimmed and assembled with Hifiasm.
Figure 2.
Figure 2.
ONT AS MHC collection metrics. (A) Raw and corrected MHC-specific ONT reads were compared to the reference haplotypes to determine error-rate, which was converted to a Q-score. Raw error rates decreased (higher Q-score) when the sampling frequency increased to 5 kHz and when the P2 Solo was used. (B) Length distribution of MHC-specific reads 10 kb or longer. N50 values are shown in white boxes at the center of each violin plot. UL sample preparations resulted in N50s twice that of SFE at the expense of lower depth of coverage. Dark colored samples were sequenced using both SFE and UL. (C) Corrected reads were aligned to assembled haplotypes allowing one location per read. Depth of coverage across each haplotype is summarized in a box plot.
Figure 3.
Figure 3.
Haplotype Scaffold Generation. Haplotype 1 is in blue while haplotype 2 is in red. (A) Uncorrected MHC-specific reads are aligned to hg38 and used to detect and phase SNVs into blocks. SNV phase blocks are then connected using methylation information, where possible. (B) In parallel, corrected MHC-specific reads are assembled. Primary contig(s) (labeled as ‘1’) span the length of the MHC, can switch haplotypes across regions of high homozygosity and represent combinations of homozygous, heterozygous, and haplotype-resolved regions. Alternate haplotigs (labeled 2, 3 and 4) are haplotype-resolved and represent the alternative haplotype to the corresponding location on the primary contig(s), in highly heterozygous regions of the MHC. (C) MHC-specific reads are aligned to the assembly and used to call and phase SNVs and methylation signals across heterozygous primary contig regions. (D) The initial assembly, hg38-based SNVs and contig-based SNVs are linked. Phase can be set at hg38-based phase breaks if the break is spanned by either a contig-based phase block or two haplotype-resolved contig regions. (E) Phase is estimated at any remaining breaks using phased 1kGP variant calls.
Figure 4.
Figure 4.
Phasing distance estimation. Fifty thousand paired points were randomly selected across the MHC for every 5 kb distance between 30 and 250 kb. Paired positions connected by at least one real ONT read were considered phased. The percentage of the 50 000 paired points that were phased at each distance were plotted to show the phasing decay. The analysis was done for both the SFE (A) and UL (B) read preparation methods.

References

    1. Browning SR, Browning BL Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011; 12:703–14. 10.1038/nrg3054. - DOI - PMC - PubMed
    1. Ebert P, Audano PA, Zhu Q et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021; 372:eabf7117. 10.1126/science.abf7117. - DOI - PMC - PubMed
    1. Cheng H, Concepcion GT, Feng X et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021; 18:170–5. 10.1038/s41592-020-01056-5. - DOI - PMC - PubMed
    1. Snyder MW, Adey A, Kitzman JO et al. Haplotype-resolved genome sequencing: experimental methods and applications. Nat Rev Genet. 2015; 16:344–58. 10.1038/nrg3903. - DOI - PubMed
    1. Fu Y, Aganezov S, Mahmoud M et al. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun. 2024; 15:5327. 10.1038/s41467-024-49588-0. - DOI - PMC - PubMed

MeSH terms