Complex genome assembly based on long-read sequencing
- PMID: 35940845
- DOI: 10.1093/bib/bbac305
Complex genome assembly based on long-read sequencing
Abstract
High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.
Keywords: genome assembly; haplotype; long-read sequencing.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes.Int J Mol Sci. 2022 Dec 30;24(1):649. doi: 10.3390/ijms24010649. Int J Mol Sci. 2022. PMID: 36614092 Free PMC article.
-
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x. Genome Biol. 2021. PMID: 34706745 Free PMC article.
-
Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.BMC Bioinformatics. 2021 Jun 5;22(1):303. doi: 10.1186/s12859-021-04118-3. BMC Bioinformatics. 2021. PMID: 34090340 Free PMC article.
-
Computational methods for chromosome-scale haplotype reconstruction.Genome Biol. 2021 Apr 12;22(1):101. doi: 10.1186/s13059-021-02328-9. Genome Biol. 2021. PMID: 33845884 Free PMC article. Review.
-
Recent Advances in Assembly of Complex Plant Genomes.Genomics Proteomics Bioinformatics. 2023 Jun;21(3):427-439. doi: 10.1016/j.gpb.2023.04.004. Epub 2023 Apr 25. Genomics Proteomics Bioinformatics. 2023. PMID: 37100237 Free PMC article. Review.
Cited by
-
Key FAD2, FAD3, and SAD Genes Involved in the Fatty Acid Synthesis in Flax Identified Based on Genomic and Transcriptomic Data.Int J Mol Sci. 2023 Oct 4;24(19):14885. doi: 10.3390/ijms241914885. Int J Mol Sci. 2023. PMID: 37834335 Free PMC article.
-
Computational Biology Helps Understand How Polyploid Giant Cancer Cells Drive Tumor Success.Genes (Basel). 2023 Mar 26;14(4):801. doi: 10.3390/genes14040801. Genes (Basel). 2023. PMID: 37107559 Free PMC article. Review.
-
Thalassemia screening by third-generation sequencing: Pilot study in a Thai population.Obstet Med. 2024 Jun;17(2):101-107. doi: 10.1177/1753495X231207676. Epub 2023 Oct 26. Obstet Med. 2024. PMID: 38784187 Free PMC article.
-
Benchmarking of bioinformatics tools for the hybrid de novo assembly of human and non-human whole-genome sequencing data.Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 40703096 Free PMC article.
-
Deciphering the roles of unknown/uncharacterized genes in plant development and stress responses.Front Plant Sci. 2023 Nov 23;14:1276559. doi: 10.3389/fpls.2023.1276559. eCollection 2023. Front Plant Sci. 2023. PMID: 38078098 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources