Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 16:14:1184112.
doi: 10.3389/fpls.2023.1184112. eCollection 2023.

Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning

Affiliations

Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning

Emily E Delorean et al. Front Plant Sci. .

Abstract

As sequencing costs decrease and availability of high fidelity long-read sequencing increases, generating experiment specific de novo genome assemblies becomes feasible. In many crop species, obtaining the genome of a hybrid or heterozygous individual is necessary for systems that do not tolerate inbreeding or for investigating important biological questions, such as hybrid vigor. However, most genome assembly methods that have been used in plants result in a merged single sequence representation that is not a true biologically accurate representation of either haplotype within a diploid individual. The resulting genome assembly is often fragmented and exhibits a mosaic of the two haplotypes, referred to as haplotype-switching. Important haplotype level information, such as causal mutations and structural variation is therefore lost causing difficulties in interpreting downstream analyses. To overcome this challenge, we have applied a method developed for animal genome assembly called trio-binning to an intra-specific hybrid of chili pepper (Capsicum annuum L. cv. HDA149 x Capsicum annuum L. cv. HDA330). We tested all currently available softwares for performing trio-binning, combined with multiple scaffolding technologies including Bionano to determine the optimal method of producing the best haplotype-resolved assembly. Ultimately, we produced highly contiguous biologically true haplotype-resolved genome assemblies for each parent, with scaffold N50s of 266.0 Mb and 281.3 Mb, with 99.6% and 99.8% positioned into chromosomes respectively. The assemblies captured 3.10 Gb and 3.12 Gb of the estimated 3.5 Gb chili pepper genome size. These assemblies represent the complete genome structure of the intraspecific hybrid, as well as the two parental genomes, and show measurable improvements over the currently available reference genomes. Our manuscript provides a valuable guide on how to apply trio-binning to other plant genomes.

Keywords: HiFi; genome assembly; haplotype; pepper; trio-binning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Trio-binning workflow. Overview of the trio-binning workflow used for producing haplotype-resolved biologically accurate plant genomes. (A) Schematic of lab-based and theoretical protocol utilized in the trio-binning workflow. (B) Detailed overview of the in silico process for assembly and scaffolding, including incorporation of Bionano optical mapping.
Figure 2
Figure 2
Haplotype switching. Haplotype switching was illustrated by aligning TrioCanu binned HiFi reads of parent A (HDA149) and parent B (HDA330) to each contig level genome assembly. The x-axis shows 1 Mb windows across contigs. The contigs were arranged from longest to shortest. Vertical gray lines show the boundaries of contigs. The y-axis shows the difference in percent coverage of the binned reads over a 1 Mb window of the given assembly. Higher coverage of HDA149 is shown in pink and higher coverage of HDA330 is shown in blue. (A) Hifiasm HDA149 assembly with trio-binning. (B) Hifiasm HDA330 assembly with trio-binning. (C) TrioCanu HDA149 assembly with trio-binning. (D) TrioCanu HDA330 assembly with trio-binning. (E) Hifiasm haplotype 1 assembly in default run mode, without parental k-mers for trio-binning. (F) Hifiasm haplotype 2 assembly in default run mode, without parental k-mers for trio-binning.
Figure 3
Figure 3
Utility of reciprocal scaffolding of assemblies from alternate software. Dotplots show alignments between largest contigs of TrioCanu and Hifiasm assemblies. Opportunities to improve contiguity through iterative scaffolding are highlighted in boxes that are numbered and shown in pink for HDA149 (A) or blue for HDA330 (B).
Figure 4
Figure 4
Characterization of developed assemblies. Circos plots of final Hifiasm assemblies HDA149v1.0 (A) and HDA330v1.0 (B) show long terminal repeat content across 1 MB windows in track 1, gap locations in track 2 and telomere repeat peaks across 1 kb windows in track 3.
Figure 5
Figure 5
Comparison of final Hifiasm assemblies. Dotplots of assembly by assembly alignments of (A) HDA149v1.0 to HDA330v1.0, (B) HDA149v1.0 to Dempsey v1.0, and (C) HDA330v1.0 to Dempsey v1.0. Gridlines show boundaries of chromosomes (x-axis) and color indicates percent identity of the alignment.
Figure 6
Figure 6
Comparison of final TrioCanu and Hifiasm assemblies. Dotplots of assembly by assembly alignments of (A) HDA149alt-v1.0 (TrioCanu) to HDA149v1.0 (Hifiasm), (B) HDA330alt-v1.0 (TrioCanu) to HDA330v1.0 (Hifiasm). Gridlines show boundaries of chromosomes (x-axis) and color indicates percent identity of the alignment. Circos plot of final TrioCanu assemblies HDA149alt-v1.0 (C) and HDA330alt-v1.0 (D) show regions of shared sequence to the corresponding final Hifiasm assembly in track 1, the number of gaps across 1 MB windows in track 2, the HiFi read alignment coverage across 1 MB windows in track 3, long terminal repeat content across 1 MB windows in track 4, and telomere repeat peaks across 1 kb windows in track 5.

References

    1. Alonge M., Lebeigle L., Kirsche M., Jenike K., Ou S., Aganezov S., et al. . (2022). Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 1–19. doi: 10.1186/S13059-022-02823-7 - DOI - PMC - PubMed
    1. Andrews S. (2010) FastQC: a quality control tool for high throughput sequence data. Available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
    1. Bayer P. E., Golicz A. A., Scheben A., Batley J., Edwards D. (2020). Plant pan-genomes are the new reference. Nat. Plants 6 (8), 914–920. doi: 10.1038/s41477-020-0733-0 - DOI - PubMed
    1. Belletti P., Marzachì C., Lanteri S. (1998). Flow cytometric measurement of nuclear DNA content in Capsicum (Solanaceae). Plant System. Evol. 209, 85–91. doi: 10.1007/BF00991526 - DOI
    1. Benevenuto J., Ferrão L. F. V., Amadeu R. R., Munoz P. (2019). How can a high-quality genome assembly help plant breeders? Gigascience 8, 1–4. doi: 10.1093/GIGASCIENCE/GIZ068 - DOI - PMC - PubMed