Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 29;22(1):120.
doi: 10.1186/s13059-021-02336-9.

Complete vertebrate mitogenomes reveal widespread repeats and gene duplications

Collaborators, Affiliations

Complete vertebrate mitogenomes reveal widespread repeats and gene duplications

Giulio Formenti et al. Genome Biol. .

Abstract

Background: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.

Results: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.

Conclusions: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

Keywords: Assembly; Duplications; Long reads; Mitochondrial DNA; Repeats; Sequencing; Vertebrate.

PubMed Disclaimer

Conflict of interest statement

V. C., S. M., and D. F. are employees of Oxford Nanopore Technologies Limited. J. K. is Chief Scientific Officer of Pacific Biosciences.

Figures

Fig. 1
Fig. 1
Paired comparisons of mitoVGP assemblies with NOVOPlasty and Genbank/RefSeq assemblies. NOVOPlasty assemblies are split into three categories: (1) circular (green), (2) single-contig (yellow), (3) multiple-contigs (orange). a–d Comparisons between NOVOPlasty and mitoVGP assemblies for sequence identity (a, including and excluding IUPAC bases), assembly length (b), annotated repeat length (c, in circular NOVOPlasty assemblies and matched mitoVGP assemblies), and number of gene duplications (d, in circular NOVOPlasty assemblies and matched mitoVGP assemblies). Note that in panel a, single-contig assemblies show levels of identity very similar to circular ones, and therefore they largely overlap and are difficult to distinguish from the figure. e–h Genbank/RefSeq comparisons of mitogenome assembly length (e), annotated repeat length (f), number of missing genes (g), and number of gene duplications (h). Statistical significance (one-sided paired samples Wilcoxon test) is reported above each plot. In the first plot, outliers having identity < 99.7% are labeled. Top10 outliers in the 20th and 80th percentiles are labeled in the other plots
Fig. 2
Fig. 2
Duplications and repeats in mitoVGP assemblies. a Comparison of mitoVGP, NOVOPlasty, and RefSeq mitogenome assemblies for the sand lizard (Lacerta agilis). Duplicated genes missing from the reference and NOVOPlasty assemblies: MT-CYB, MT-TT, MT-TP (brown bar). Gray, read coverage: PacBio CLR 34× and Nanopore 46× mean coverage. b Golden eagle (Aquila chrysaetos), where the current RefSeq sequence (top) lacks a large fraction of a tandem repeat in the CR and 10 bp from the start of the MT-TF gene (brown bar). Mean CLR coverage 170×. c Kakapo (Strigops habroptilus), where the RefSeq sequence (top) lacks the entire CR. Mean coverage 99×. d Maguari stork (Ciconia maguari). Duplicated genes: MT-CYB, MT-TT, MT-TP, MT-ND6, and MT-TE (brown bar). Mean CLR coverage 209×. e Warty frogfish (Antennarius maculatus). Duplicated genes: MT-TV and MT-RNR2 (brown bar). Mean CLR coverage 21×. The rRNA genes are colored in red, tRNA genes in green, other genes in yellow, and the CR/intergenic region in blue. Homologous regions are highlighted in orange, tandem repeats in shades of blue, gaps as dashed lines and duplicated genes with brown bars. Long read coverage depth represented by the gray track. All labels in kbp. Coordinates relative to the PacBio mitoVGP assembly
Fig. 3
Fig. 3
Duplications and repeats across the phylogeny and length deviation in repetitive elements. a The presence of mitochondrial repeats and duplications is mapped onto the tree for each species. The repeats are most often in the CR. Since the tree topology of phylogenies based on mtDNA is often inaccurate, this tree topology is based on relationships determined from current genome-scale phylogenies in the literature [–51]. b The length deviation relative to the size of the reference is reported for each read spanning the repetitive region or duplication. No deviation from the assembled VGP reference is marked by the dashed line. Colors correspond to different repeats and duplications. Individual density distributions are shown in the background. Circles highlight five species that had reads that lack gene duplications when these are present in the mitoVGP assembly, suggesting potential heteroplasmy
Fig. 4
Fig. 4
Evidence of heteroplasmy associated with a tandem repeat in the kakapo mitochondrial genome. a Fraction of CLR reads that support the copy number in the MitoVGP reference (blue bar is 11 copies). b IGV [54] plot showing the PacBio CLR alignment of reads that fully span the ~ 925-bp-long tandem repeat (between green dashed lines, repeat unit = 84 bp), highlighting the presence of reads that support the copy number of 11 in the mitoVGP reference, but also reads supporting fewer copies of the repeat (red arrows, black in panel a)
Fig. 5
Fig. 5
Distribution of vertebrate mtDNA sequence lengths in the VGP and RefSeq datasets. Length distribution histogram in the VGP dataset (yellow bars), with its density distribution (yellow area) and the RefSeq dataset density distribution (blue area). The RefSeq dataset was randomly resampled to ensure the same within-order representation of sequences as the VGP dataset (1000 replicates). The respective means are highlighted by the dashed lines. Individual data points are shown at the bottom

Similar articles

  • Towards complete and error-free genome assemblies of all vertebrate species.
    Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW 3rd, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnso… See abstract for full author list ➔ Rhie A, et al. Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28. Nature. 2021. PMID: 33911273 Free PMC article.
  • Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing.
    Solares EA, Chakraborty M, Miller DE, Kalsow S, Hall K, Perera AG, Emerson JJ, Hawley RS. Solares EA, et al. G3 (Bethesda). 2018 Oct 3;8(10):3143-3154. doi: 10.1534/g3.118.200162. G3 (Bethesda). 2018. PMID: 30018084 Free PMC article.
  • Widespread false gene gains caused by duplication errors in genome assemblies.
    Ko BJ, Lee C, Kim J, Rhie A, Yoo DA, Howe K, Wood J, Cho S, Brown S, Formenti G, Jarvis ED, Kim H. Ko BJ, et al. Genome Biol. 2022 Sep 27;23(1):205. doi: 10.1186/s13059-022-02764-1. Genome Biol. 2022. PMID: 36167596 Free PMC article.
  • Oxford Nanopore MinION Sequencing and Genome Assembly.
    Lu H, Giordano F, Ning Z. Lu H, et al. Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279. doi: 10.1016/j.gpb.2016.05.004. Epub 2016 Sep 17. Genomics Proteomics Bioinformatics. 2016. PMID: 27646134 Free PMC article. Review.
  • [Mitogenome assembly strategies and software applications in the genome era].
    Kuang WM, Yu L. Kuang WM, et al. Yi Chuan. 2019 Nov 20;41(11):979-993. doi: 10.16288/j.yczz.19-227. Yi Chuan. 2019. PMID: 31735702 Review. Chinese.

Cited by

References

    1. Karnkowska A, Vacek V, Zubáčová Z, Treitli SC, Petrželková R, Eme L, Novák L, Žárský V, Barlow LD, Herman EK, Soukal P, Hroudová M, Doležal P, Stairs CW, Roger AJ, Eliáš M, Dacks JB, Vlček Č, Hampl V. A eukaryote without a mitochondrial organelle. Curr Biol. 2016;26(10):1274–1284. doi: 10.1016/j.cub.2016.03.053. - DOI - PubMed
    1. Kolesnikov AA, Gerasimov ES. Diversity of mitochondrial genome organization. Biochemistry. 2012;77:1424–1435. - PubMed
    1. D’Erchia AM, Atlante A, Gadaleta G, Pavesi G, Chiara M, De Virgilio C, et al. Tissue-specific mtDNA abundance from exome data and its correlation with mitochondrial transcription, mass and respiratory activity. Mitochondrion. 2015;20:13–21. doi: 10.1016/j.mito.2014.10.005. - DOI - PubMed
    1. Cole LW. The evolution of per-cell organelle number. Front Cell Dev Biol. 2016;4:85. doi: 10.3389/fcell.2016.00085. - DOI - PMC - PubMed
    1. Mindell DP, Sorenson MD, Dimcheff DE. Multiple independent origins of mitochondrial gene order in birds. Proc Natl Acad Sci U S A. 1998;95(18):10693–10697. doi: 10.1073/pnas.95.18.10693. - DOI - PMC - PubMed

Publication types