Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
- PMID: 31406327
- PMCID: PMC6776680
- DOI: 10.1038/s41587-019-0217-9
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
Abstract
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Conflict of interest statement
Competing Financial Interests Statement
A.M.W., A.T., D.R.R., G.T.C., M.W.H., P.P., R.J.H., W.J.R., and Y.Q. are employees and shareholders of Pacific Biosciences. A.C., A.K., M.A.D., and P.C. are employees and shareholders of Google. A.F. and C-S.C. are employees and shareholders of DNAnexus. A.C. is a shareholder and was an employee of DNAnexus for a portion of this work.
Figures




Similar articles
-
Analytical validation of germline small variant detection using long-read HiFi genome sequencing.Genome Res. 2025 Jun 2;35(6):1391-1399. doi: 10.1101/gr.278836.123. Genome Res. 2025. PMID: 40216554
-
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.Nat Biotechnol. 2021 Mar;39(3):302-308. doi: 10.1038/s41587-020-0719-5. Epub 2020 Dec 7. Nat Biotechnol. 2021. PMID: 33288906 Free PMC article.
-
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14. Genome Res. 2020. PMID: 32801147 Free PMC article.
-
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11. Methods Mol Biol. 2023. PMID: 36335499 Review.
-
Whole genome sequencing.Methods Mol Biol. 2010;628:215-26. doi: 10.1007/978-1-60327-367-1_12. Methods Mol Biol. 2010. PMID: 20238084 Review.
Cited by
-
Chromosome-level genome assembly and functional annotation of Citrullus colocynthis: unlocking genetic resources for drought-resilient crop development.Planta. 2024 Oct 23;260(6):124. doi: 10.1007/s00425-024-04551-7. Planta. 2024. PMID: 39443340 Free PMC article.
-
Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing.Sci Adv. 2022 Oct 21;8(42):eabo7676. doi: 10.1126/sciadv.abo7676. Epub 2022 Oct 21. Sci Adv. 2022. PMID: 36269834 Free PMC article.
-
The structure and assembly mechanisms of T4-like cyanophages community in the South China Sea.Microbiol Spectr. 2024 Feb 6;12(2):e0200223. doi: 10.1128/spectrum.02002-23. Epub 2024 Jan 9. Microbiol Spectr. 2024. PMID: 38193726 Free PMC article.
-
Full-length transcriptome and analysis of bmp-related genes in Platypharodon extremus.Heliyon. 2022 Sep 28;8(10):e10783. doi: 10.1016/j.heliyon.2022.e10783. eCollection 2022 Oct. Heliyon. 2022. PMID: 36276739 Free PMC article.
-
miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data.Brief Bioinform. 2024 Sep 23;25(6):bbae473. doi: 10.1093/bib/bbae473. Brief Bioinform. 2024. PMID: 39331016 Free PMC article.
References
-
- DNA Sequencing Costs: Data. National Human Genome Research Institute (NHGRI) Available at: https://www.genome.gov/27541954/dna-sequencing-costs-data/. (Accessed: 7th December 2018)
-
- Smith LM et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674–679 (1986). - PubMed
-
- Lander ES et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). - PubMed
-
- Venter JC et al. The sequence of the human genome. Science 291, 1304–1351 (2001). - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous