A diploid assembly-based benchmark for variants in the major histocompatibility complex
- PMID: 32963235
- PMCID: PMC7508831
- DOI: 10.1038/s41467-020-18564-9
A diploid assembly-based benchmark for variants in the major histocompatibility complex
Abstract
Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.
Conflict of interest statement
C.-S.C. and A.F. are employees of DNAnexus Inc., a company providing a cloud computing platform for processing genomic information. C.-S.C. is a co-founder and partner of Omni BioComputing, LLC, which currently develops genome assembler related technologies. Q.Z. is an employee of Laboratory Corporation of America Holdings, a company providing clinical diagnostics services. A.T.D. is a partner in Peptide Groove, LLP. A.C. is an employee of Google, a company providing a cloud computing platform. W.J.R. is an employee and shareholder of Pacific Biosciences. A.M.B. is an ex-employee and shareholder of 10x Genomics. The other authors declare no competing interests.
Figures




Similar articles
-
Variant calling and benchmarking in an era of complete human genome sequences.Nat Rev Genet. 2023 Jul;24(7):464-483. doi: 10.1038/s41576-023-00590-0. Epub 2023 Apr 14. Nat Rev Genet. 2023. PMID: 37059810 Review.
-
De novo assembly and phasing of a Korean human genome.Nature. 2016 Oct 13;538(7624):243-247. doi: 10.1038/nature20098. Epub 2016 Oct 5. Nature. 2016. PMID: 27706134
-
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024. PeerJ. 2024. PMID: 39351368 Free PMC article.
-
JTK: targeted diploid genome assembler.Bioinformatics. 2023 Jul 1;39(7):btad398. doi: 10.1093/bioinformatics/btad398. Bioinformatics. 2023. PMID: 37354526 Free PMC article.
-
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11. Methods Mol Biol. 2023. PMID: 36335499 Review.
Cited by
-
Editorial: Population genomic architecture: Conserved polymorphic sequences (CPSs), not linkage disequilibrium.Front Genet. 2023 Jan 26;14:1140350. doi: 10.3389/fgene.2023.1140350. eCollection 2023. Front Genet. 2023. PMID: 36777737 Free PMC article. No abstract available.
-
Variant calling and benchmarking in an era of complete human genome sequences.Nat Rev Genet. 2023 Jul;24(7):464-483. doi: 10.1038/s41576-023-00590-0. Epub 2023 Apr 14. Nat Rev Genet. 2023. PMID: 37059810 Review.
-
Haplotypic resolution of the challenging genomic regions of MHC and KIR using a combination of targeted sequencing and a novel assembly pipeline.Nucleic Acids Res. 2025 May 22;53(10):gkaf441. doi: 10.1093/nar/gkaf441. Nucleic Acids Res. 2025. PMID: 40464686 Free PMC article.
-
Targeted and complete genomic sequencing of the major histocompatibility complex in haplotypic form of individual heterozygous samples.Genome Res. 2024 Oct 29;34(10):1500-1513. doi: 10.1101/gr.278588.123. Genome Res. 2024. PMID: 39327030 Free PMC article.
-
Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes.Nat Methods. 2023 Aug;20(8):1213-1221. doi: 10.1038/s41592-023-01914-y. Epub 2023 Jun 26. Nat Methods. 2023. PMID: 37365340 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous