Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6;42(1):msae247.
doi: 10.1093/molbev/msae247.

TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data

Affiliations

TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data

Wenfei Xian et al. Mol Biol Evol. .

Abstract

Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes. TIPPo employs a deep learning model for initial read classification and leverages k-mer counting for further refinement, significantly reducing the impact of nuclear insertions of organellar DNA on the assembly process. We used TIPPo to completely assemble a set of 54 complete chloroplast genomes. No other tool was able to completely assemble this set. TIPPo is comparable with PMAT in assembling mitochondrial genomes from most species but does achieve even higher completeness for several species. We also used the assembled organelle genomes to identify instances of nuclear plastid DNA (NUPTs) and nuclear mitochondrial DNA (NUMTs) insertions. The cumulative length of NUPTs/NUMTs positively correlates with the size of the nuclear genome, suggesting that insertions occur stochastically. NUPTs/NUMTs show predominantly C:G to T:A changes, with the mutated cytosines typically found in CG and CHG contexts, suggesting that degradation of NUPT and NUMT sequences is driven by the known elevated mutation rate of methylated cytosines. Small interfering RNA loci are enriched in NUPTs and NUMTs, consistent with the RdDM pathway mediating DNA methylation in these sequences.

Keywords: PacBio HiFi reads; chloroplast genome; genome assembly; mitochondrial genome; nuclear insertions of organellar genomes.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: D.W. holds equity in Computomics, which advises plant breeders. D.W. also consults for KWS SE, a plant breeder and seed producer with activities throughout the world. All other authors declare no conflicts.

Figures

Fig. 1.
Fig. 1.
Workflow of TIPPo.
Fig. 2.
Fig. 2.
Benchmarking of four chloroplast genome assembly tools and genome statistics. See Materials and Methods for phylogenetic tree. The assemblies for Adenosma buchneroides and Helichrysum umbraculigerum are presented here for the first time. Zygnema circumcarinatum, Taxus chinensis,  Glycyrrhiza uralensis, and Trifolium repens have lost IRs, and the three topologically defined regions are therefore not measured.
Fig. 3.
Fig. 3.
Benchmarking of mitochondrial genome assembly. a) See Materials and Methods for phylogenetic tree. The assemblies for Persea americana, Kobresia myosuroides, Triticum monococcum, Panicum miliaceum, Helichrysum umbraculigerum, Ipomoea cairica, Solanum rostratum, Adenosma buchneroides, Sesamum indicum, Perilla frutescens, Thymus quinquecostatus, Citrus australis, Ochroma pyramidale, Linum usitatissimum, Euphorbia peplus, Carya illinoinensis, and Coriaria nepalensis are presented here for the first time. The numbers inside the circles indicate the number of nonredundant protein-coding genes in the assembly. Light shading indicates superior results with TIPPo or PMAT. b) Whole-genome alignment, including the published, TIPPo and PMAT assemblies (both raw and master), of the S. conica mitochondrial genome, visualized with Alitv (v1.0.6). c) TIPPo assembly graph of S. conica visualized with Bandage (v0.9.0). d) PMAT assembly graph of S. conica visualized with Bandage (v0.9.0).
Fig. 4.
Fig. 4.
Computational cost. a) Ratio of elapsed times between each pair of the four tools. b) Ratio of peak memory usage between each pair of the four tools. Gray dots indicate different species. The means are shown as horizontal lines, with the upper and lower box indicating the interquartile range (IQR), and the whiskers extending to the most extreme values within 1.5 times the IQR from the first and third quartiles.
Fig. 5.
Fig. 5.
Comparison of NUPT and NUMT sequences and the corresponding organellar genomes. a) Comparison of cumulative lengths of NUPTs and of nuclear genome size. b) Comparison of cumulative lengths of NUMTs and of nuclear genome size. c) Comparison of cumulative lengths of NUPTs and of NUMTs. d) Cumulative length distribution of NUPTs across different identities. e) Cumulative length distribution of NUMTs as a function of sequence identity with the corresponding mitochondrial genome. f) Correlation between NUPT/chloroplast genome identity and NUMT/mitochondrial genome identity. Bars indicate standard errors.
Fig. 6.
Fig. 6.
The landscape of substitutions in NUPTs and NUMTs. a) Distribution of nucleotide substitutions in NUPTs, inferred from sequence comparison with the corresponding chloroplast genome. b) Distribution of nucleotide substitutions in NUMTs, inferred from sequence comparison with the corresponding mitochondrial genome. c) Enrichment of cytosine substitutions in NUPTs and NUMTs at CG sites.
Fig. 7.
Fig. 7.
Enrichment of siRNAs in NUPTs and NUMTs. a) Overlap of siRNA loci with NUPTs. b) Overlaps of siRNA loci with NUMTs. Species in (a) and (b) annotated at the bottom. The numbers on top of each bar represent the enrichment, and the error bars represent the 95% CI from random sampling of the genome.

Similar articles

Cited by

References

    1. Aldrich J, Cherney B, Merlin E, Williams C, Mets L. Recombination within the inverted repeat sequences of the Chlamydomonas reinhardii chloroplast genome produces two orientation isomers. Curr Genet. 1985:9(3):233–238. 10.1007/BF00420317. - DOI - PubMed
    1. Alverson AJ, Wei X, Rice DW, Stern DB, Barry K, Palmer JD. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010:27(6):1436–1448. 10.1093/molbev/msq029. - DOI - PMC - PubMed
    1. Ankenbrand MJ, Hohlfeld S, Hackl T, Förster F. AliTV—interactive visualization of whole genome comparisons. PeerJ Comput Sci. 2017:3:e116. 10.7717/peerj-cs.116. - DOI
    1. Axtell MJ. ShortStack: comprehensive annotation and quantification of small RNA genes. RNA. 2013:19(6):740–751. 10.1261/rna.035279.112. - DOI - PMC - PubMed
    1. Bi C, Shen F, Han F, Qu Y, Hou J, Xu K, Xu L-A, He W, Wu Z, Yin T. PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data. Hortic Res. 2024:11(3):uhae023. 10.1093/hr/uhae023. - DOI - PMC - PubMed

LinkOut - more resources