Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2020 Dec 21;9(12):giaa146.
doi: 10.1093/gigascience/giaa146.

Comparison of long-read methods for sequencing and assembly of a plant genome

Affiliations
Comparative Study

Comparison of long-read methods for sequencing and assembly of a plant genome

Valentine Murigneux et al. Gigascience. .

Abstract

Background: Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample.

Results: Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements.

Conclusions: The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.

Keywords: BGI; ONT; Oxford Nanopore Technologies; PacBio; Pacific Biosciences; PromethION; Sequel; assembly; long reads; single-tube long fragment read; stLFR.

PubMed Disclaimer

Conflict of interest statement

Employees of BGI (W.T., I.H., Q.Y., B.Y., O.W., M.X, P.W.), MGI (H.W.), and Complete Genomics (E.A., Q.M., R.D., B.A.P.) have stock holdings in BGI. The authors declare that they have no other competing interests.

Figures

Figure 1:
Figure 1:
ONT, PacBio, and BGI genome assembly statistics. The total assembly length is plotted against the contig N50 for each assembler and sequencing dataset.
Figure 2:
Figure 2:
Number of mismatches and indels identified in the long-read assemblies as compared to the Illumina short-read assembly generated by SPAdes. The BGI + ONT and BGI + PacBio assemblies were polished with the BGI stLFR reads using 1 iteration of NextPolish. The ONT + Illumina assemblies (except MaSuRCA) were polished with the ONT long reads using Racon and Medaka followed by the Illumina short reads using 1 iteration of NextPolish. The PacBio + Illumina assemblies (except MaSuRCA) were polished with the Illumina short reads using 1 iteration of NextPolish. *Assembly polished using Illumina reads.
Figure 3:
Figure 3:
BUSCO analysis of assemblies using the eudicotyledons dataset (2,121 genes). The x-axis depicts the percentage of complete and single-copy, complete and duplicated, fragmented, and missing BUSCOs and the y-axis indicates the assembly assessed. The BGI + ONT and BGI + PacBio assemblies were polished with the BGI stLFR reads using 1 iteration of NextPolish. The ONT + Illumina assemblies (except MaSuRCA) were polished with the ONT long reads using Racon and Medaka followed by the Illumina short reads using 1 iteration of NextPolish. The PacBio + Illumina assemblies (except MaSuRCA) were polished with the Illumina short reads using 1 iteration of NextPolish.

Comment in

  • Improvements in the sequencing and assembly of plant genomes.
    Sharma P, Al-Dossary O, Alsubaie B, Al-Mssallem I, Nath O, Mitter N, Rodrigues Alves Margarido G, Topp B, Murigneux V, Kharabian Masouleh A, Furtado A, Henry RJ. Sharma P, et al. GigaByte. 2021 Jun 10;2021:gigabyte24. doi: 10.46471/gigabyte.24. eCollection 2021. GigaByte. 2021. PMID: 36824328 Free PMC article.

References

    1. Gross C, Weston P. Macadamia jansenii (Proteaceae), a new species from central Queensland. Aust Syst Bot. 1992;5(6):725–8.
    1. The four macadamias. http://www.wildmacadamias.org.au/the-four-macadamias. Accessed 14 February 2020.
    1. Chase MW. Relationships between the families of flowering plants. In: Henry RJ, ed., Plant Diversity and Evolution: Genotypic and Phenotypic Variation in Higher Plants. Wallingford, UK; Cambridge, MA: CABI; 2005.
    1. Brozynska M, Furtado A, Henry RJ. Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol J. 2016;14(4):1070–85. - PMC - PubMed
    1. Abberton M, Batley J, Bentley A, et al. Global agricultural intensification during climate change: a role for genomics. Plant Biotechnol J. 2016;14(4):1095–8. - PMC - PubMed

Publication types