Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar 11:3:e5.
doi: 10.1017/qpb.2021.18. eCollection 2022.

Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions

Affiliations
Review

Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions

Boas Pucker et al. Quant Plant Biol. .

Abstract

Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.

Keywords: Oxford Nanopore Technologies (ONT); Pacific Biociences (PacBio); haplophasing; long read sequencing; plant genome assembly; plant genomics.

PubMed Disclaimer

Conflict of interest statement

B.P. was an invited speaker without financial compensation at a virtual conference (London Calling 2021) organised by Oxford Nanopore Technologies. J.V., I.I. and B.X. declare no conflicts of interest.

Figures

Fig. 1
Fig. 1
Schematic illustration of nanopore sequencing (a) and Single-Molecule Real Time (SMRT) sequencing (b). Nanopore sequencing is based on the translocation of a DNA or RNA strand through a nanopore located in an artificial membrane. Multiple nucleotides located in the nanopore determine the flow of ions through this nanopore in a specific way by physically blocking the space. This change in ion flux is recorded as an electric signal and further converted into sequence information. The illustration shows the contribution of six bases to the signal, but the number of bases depends on the pore type. SMRT sequencing detects fluorescent light emitted from nucleotides upon incorporation into a DNA strand. The DNA polymerase is located at the bottom of a well and synthesises a new DNA strand. The integration into the new DNA strand keeps the nucleotide for a sufficiently long time in the well to allow detection.
Fig. 2
Fig. 2
Plant genome project workflow from DNA extraction over Oxford Nanopore Technologies (ONT) sequencing to data submission. The indicated durations depend on the size and complexity of the investigated plant genome, with larger genomes generally taking longer to analyse. To reduce sugar content, plants are incubated in the dark for a few days prior to DNA extraction (a). Non-destructive sampling is important to allow additional genomic sequencing and also RNA-Seq if required in later stages of a project (b). Mechanical disruption of cell walls is required for the DNA extraction (c). Photometric analysis of the DNA solution (including quantification) is often the first step of quality control (d and f). Removal of short DNA fragments is highly recommended to improve the sequencing output and quality (e). ONT library preparation and sequencing can be repeated several times to increase the output (g). Graphic cards are an efficient resource to convert electric signal into sequence information in real time (h). Multiple tools are available to generate a chromosome-arm level assembly based on long reads (i). Additional polishing in multiple rounds can be necessary due to the noisy character of long reads (j). The value of a genome sequence can be enriched through the identification of relevant genetic elements like genes and transposable elements (k). All data should be shared with the community via submission to a public repository which ensures long-term storage (l). d, day(s); hr, hour(s). The given time estimates for assembly, polishing and annotation are the minimal run time required for the analyses. Manual curation and iterative improvements can take substantially longer. The estimated costs of consumables are based on a haploid 1-Gbp genome and a targeted coverage of 30× which would require six libraries to be sequenced on three MinION/GridION flow cells when assuming an average output of 10 GB per flow cell with two libraries sequenced per flow cell. Investment costs for non-standard lab equipment are independent of the specific sequencing project and only required for high-output experiments in the lab. There is an option to perform rapid sequencing without these instruments in the field, but the lower output does not make that option attractive for large plant genomes.
Fig. 3
Fig. 3
Development of sequence analysis for exploring genome structure and variability. Read mapping and variant calling was the initial approach to characterise differences between samples based on short-read (‘NGS’) data (a). Long reads allow an improved variant detection which is especially suited for the detection of structural variants (b). Independent de novo genome assemblies allow the identification of all variants and already include an assignment of variants to haplophases (c).
Fig. 4
Fig. 4
Assembly of haplophases. Diploid plant genomes have a maternal (a) and a paternal haplotype (c), which differ at specific positions (b). Long reads belong to one or the other haplotype (d). The assembly graph separates haplophases in regions with sufficient differences between both parental haplotypes, but collapses them in identical (homozygous) regions (e). Resolving the assembly graph into final sequences is possible in four different ways (f): It is possible that both haplophases are resolved by connecting the two divergent blocks correctly (1), identical regions could be assigned to one haplophase leading to a less continuous second haplophase (2 and 3), or the identical region can cause an erroneous connection of the flanking distinct sequences (4). This illustration shows the analysis of a diploid genome, but the concept is generalisable to polyploids.

Similar articles

Cited by

References

    1. Alonso-Blanco, C. , Andrade, J. , Becker, C. , Bemm, F. , Bergelson, J. , Borgwardt, K. M. , Cao, J. , Chae, E. , Dezwaan, T. M. , Ding, W. , Ecker, J. R. , Exposito-Alonso, M. , Farlow, A. , Fitz, J. , Gan, X. , Grimm, D. G. , Hancock, A. M. , Henz, S. R. , Holm, S. , … Zhou, X. (2016). 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana . Cell, 166, 481–491. - PMC - PubMed
    1. Amarasinghe, S. L. , Su, S. , Dong, X. , Zappia, L. , Ritchie, M. E. , & Gouil, Q. (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biology, 21, 30. - PMC - PubMed
    1. Armbrust, E. V. , Berges, J. A. , Bowler, C. , Green, B. R. , Martinez, D. , Putnam, N. H. , Zhou, S. , Allen, A. E. , Apt, K. E. , Bechner, M. , Brzezinski, M. A. , Chaal, B. K. , Chiovitti, A. , Davis, A. K. , Demarest, M. S. , Detter, J. C. , Glavina, T. , Goodstein, D. , Hadi, M. Z. , … Rokhsar, D. S. (2004). The genome of the diatom Thalassiosira pseudonana: Ecology, evolution, and metabolism. Science, 306, 79–86. - PubMed
    1. Banks, J. A. , Nishiyama, T. , Hasebe, M. , Bowman, J. L. , Gribskov, M. , dePamphilis, C. , Albert, V. A. , Aono, N. , Aoyama, T. , Ambrose, B. A. , et al. (2011). The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science, 332, 960–963. - PMC - PubMed
    1. Bayer, P. E. , Golicz, A. A. , Scheben, A. , Batley, J. , & Edwards, D. (2020). Plant pan-genomes are the new reference. Nature Plants, 6, 914–920. - PubMed

LinkOut - more resources