Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly
- PMID: 28396521
- PMCID: PMC5411779
- DOI: 10.1101/gr.213611.116
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly
Abstract
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
© 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.
Figures




Similar articles
-
Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies.Genome Res. 2017 May;27(5):865-874. doi: 10.1101/gr.207456.116. Epub 2016 Sep 19. Genome Res. 2017. PMID: 27646534 Free PMC article.
-
HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies.Genome Res. 2017 May;27(5):793-800. doi: 10.1101/gr.214767.116. Epub 2017 Jan 19. Genome Res. 2017. PMID: 28104618 Free PMC article.
-
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.Genome Res. 2017 May;27(5):801-812. doi: 10.1101/gr.213462.116. Epub 2016 Dec 9. Genome Res. 2017. PMID: 27940952 Free PMC article.
-
Whole genome sequencing.Methods Mol Biol. 2010;628:215-26. doi: 10.1007/978-1-60327-367-1_12. Methods Mol Biol. 2010. PMID: 20238084 Review.
-
Genetic variation and the de novo assembly of human genomes.Nat Rev Genet. 2015 Nov;16(11):627-40. doi: 10.1038/nrg3933. Epub 2015 Oct 7. Nat Rev Genet. 2015. PMID: 26442640 Free PMC article. Review.
Cited by
-
An assembly line for an improved human reference genome.Nature. 2022 Oct 19. doi: 10.1038/d41586-022-03151-3. Online ahead of print. Nature. 2022. PMID: 36261717 No abstract available.
-
The Need for a Human Pangenome Reference Sequence.Annu Rev Genomics Hum Genet. 2021 Aug 31;22:81-102. doi: 10.1146/annurev-genom-120120-081921. Epub 2021 Apr 30. Annu Rev Genomics Hum Genet. 2021. PMID: 33929893 Free PMC article. Review.
-
Identification of transcriptional subtypes in lung adenocarcinoma and squamous cell carcinoma through integrative analysis of microarray and RNA sequencing data.Sci Rep. 2021 Apr 22;11(1):8709. doi: 10.1038/s41598-021-88209-4. Sci Rep. 2021. PMID: 33888829 Free PMC article.
-
Whole-Genome Sequencing Analysis Reveals New Susceptibility Loci and Structural Variants Associated with Progressive Supranuclear Palsy.medRxiv [Preprint]. 2024 Jan 30:2023.12.28.23300612. doi: 10.1101/2023.12.28.23300612. medRxiv. 2024. Update in: Mol Neurodegener. 2024 Aug 16;19(1):61. doi: 10.1186/s13024-024-00747-3. PMID: 38234807 Free PMC article. Updated. Preprint.
-
Microglia replacement by ER-Hoxb8 conditionally immortalized macrophages provides insight into Aicardi-Goutières Syndrome neuropathology.bioRxiv [Preprint]. 2025 May 15:2024.09.18.613629. doi: 10.1101/2024.09.18.613629. bioRxiv. 2025. PMID: 39345609 Free PMC article. Preprint.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous