Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 19;33(6):1888-1906.
doi: 10.1093/plcell/koab077.

Long-read sequence assembly: a technical evaluation in barley

Affiliations

Long-read sequence assembly: a technical evaluation in barley

Martin Mascher et al. Plant Cell. .

Abstract

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Structural complexity at the R gene locus Mla. A, A dotplot of the TRITEX (short-read) scaffold versus CCS_Canu (long-read) contig encompassing the Mla locus. The region is intact and correct in CCS_Canu, but collapsed in the TRITEX assembly (repeated parallel diagonal lines) and with a small inversion (inverted diagonal line). B, Physical interval of the Mla locus from the reference barley accession Morex that contains three gene families RGH1 (orange), RGH2 (blue), and RGH3 (green) encoding nucleotide-binding, leucine-rich repeat proteins. Gray arrows define 39.7 kb tandem duplication. The duplicate regions are 99.9% identical, with only 13 SNPs and 11 InDels difference between the duplicated segments.
Figure 2
Figure 2
Alignments between Hi-C-based pseudomolecules and genetic maps. Panel A shows POPSEQ markers in the Morex x Barke and Oregon Wolfe Barley (OWB) maps (Mascher et al., 2013). Framework markers of the Morex × Barke and OWB maps are shown in red and blue, respectively. Markers integrated to the consensus POPSEQ markers are shown as gray dots. Panel B shows GBS markers mapped in Morex × Barke recombinant inbred lines (Mascher et al., 2017). Gray lines indicate scaffold boundaries.
Figure 3
Figure 3
Alignments of MorexV3 and MorexV2 pseudomolecules.
Figure 4
Figure 4
Alignments of Morex V3 and V2 pseudomolecules in the terminal 10 Mb of each chromosome arm. Gray lines indicate scaffold boundaries.
Figure 5
Figure 5
Full-length LTR-retrotransposon (fl-LTR) characteristics of the three Morex chromosome-level assembly versions. A, Fl-LTR insertion age distribution for all high-quality gap-free fl-LTR copies and superfamily subsets (RLC: Copia and RLG Gypsy superfamily, RLX unassigned. B, Overall repetitivity of fl-LTR copies in terms of 20-mer frequencies.
Figure 6
Figure 6
Sizes and insertion age distributions of full-length BARE1 retrotransposons extracted from different Morex assembly versions (V1–V3). Panels A–C show size distributions of the extracted full-length retrotransposons. Those extracted from V2 tend to be much longer due to extended stretches of unfilled gaps represented by N characters. Panels D–F show insertion age distributions of the extracted full-length retrotransposons. Retrotransposons from V1 and V2 are on average older. In V2, very young retrotransposons are almost absent. They could not be identified with our pipeline since LTRs of young elements tend to have sequence gaps.
Figure 7
Figure 7
Distribution of sequence gaps and sequence differences in BARE1 elements between Morex V1 and V3. The graph is a compilation of results from sequence alignments of 3,305 v1 and v3 full-length BARE1 retrotransposons. As individual retrotransposon copies can differ in length, the length was normalized to 1,000 bins. The plot shows numbers of SNPs and numbers of N’s in 10-bin windows. The LTRs correspond to approximately the first and last 20% of the retrotransposon. These regions are highly enriched in SNPs and sequence gaps because of the inability of short-read assemblies to resolve highly similar regions longer than a few hundred base pair.
None

References

    1. Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, et al. (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182:145– 161.e123 - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 - PubMed
    1. Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M (2016) PGP repository: a plant phenomics and genomics data publication infrastructure. Database 2016: baw033. - PMC - PubMed
    1. Arend D, Lange M, Chen J, Colmsee C, Flemming S, Hecht D, Scholz U (2014) e! DAL-a framework to store, share and publish research data. BMC Bioinformatics 15:214. - PMC - PubMed
    1. Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S, et al. (2014) A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms. Plant Physiol 164:412–423 - PMC - PubMed

Publication types