Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 22:13:984513.
doi: 10.3389/fgene.2022.984513. eCollection 2022.

Comparison of detection methods and genome quality when quantifying nuclear mitochondrial insertions in vertebrate genomes

Affiliations

Comparison of detection methods and genome quality when quantifying nuclear mitochondrial insertions in vertebrate genomes

Deborah A Triant et al. Front Genet. .

Abstract

The integration of mitochondrial genome fragments into the nuclear genome is well documented, and the transfer of these mitochondrial nuclear pseudogenes (numts) is thought to be an ongoing evolutionary process. With the increasing number of eukaryotic genomes available, genome-wide distributions of numts are often surveyed. However, inconsistencies in genome quality can reduce the accuracy of numt estimates, and methods used for identification can be complicated by the diverse sizes and ages of numts. Numts have been previously characterized in rodent genomes and it was postulated that they might be more prevalent in a group of voles with rapidly evolving karyotypes. Here, we examine 37 rodent genomes, and an additional 26 vertebrate genomes, while also considering numt detection methods. We identify numts using DNA:DNA and protein:translated-DNA similarity searches and compare numt distributions among rodent and vertebrate taxa to assess whether some groups are more susceptible to transfer. A combination of protein sequence comparisons (protein:translated-DNA) and BLASTN genomic DNA searches detect 50% more numts than genomic DNA:DNA searches alone. In addition, higher-quality RefSeq genomes produce lower estimates of numts than GenBank genomes, suggesting that lower quality genome assemblies can overestimate numts abundance. Phylogenetic analysis shows that mitochondrial transfers are not associated with karyotypic diversity among rodents. Surprisingly, we did not find a strong correlation between numt counts and genome size. Estimates using DNA: DNA analyses can underestimate the amount of mitochondrial DNA that is transferred to the nucleus.

Keywords: BLASTN; Microtus; NUMT; TFASTX; karyotypic diversity; rodents; scoring matrix.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Rodent neighbor-joining tree constructed with full mtDNA genome sequences with M. muntjak colored in pink and H. sapiens colored in purple. GenBank genomes are colored orange and RefSeq colored green. The three vole species from the genus Microtus (M. agrestis, M. arvalis and M. ochrogaster) are found on the far left. Total numt counts (A) and length (B) are shown for each species for the 7 different similarity searches run are shown as follows: 1) plus sign (+): coding sequence queries—BLASTN DNA:DNA with “-task megablast” default option (BLN); 2) (x): coding sequence queries—BLASTN DNA:DNA with “-task blastn” option (BLNT); 3) open circle: coding sequence queries—TFASTX protein:translated-DNA searches with the MD10 protein scoring matrix (MD10); 4) filled circle: coding sequence queries—TFASTX protein:translated-DNA searches with the MD40 protein scoring matrix (MD40); 5) open square: whole mtDNA genome queries—BLASTN DNA:DNA with “-task megablast” default (BLNG); 6) diamond: whole mtDNA genome queries—BLASTN DNA:DNA with “-task blastn” option (BLNGT); 7) square with plus inside: combined numts found only with coding sequence queries TFASTX protein:translated-DNA with the MD40 scoring matrix, only those found with whole mtDNA genomes—BLASTN DNA:DNA that includes numts from non-coding portions of the genome and overlapping numts found with both (MD40BG).
FIGURE 2
FIGURE 2
Total numt counts for the genomes in Figure 1 for 7 different similarity searches as a function of alignment length. Within each box plot, there are 39-symbols for the 39-genomes displayed in Figure 1 color-coded by genome type. (A) alignments at least 30 nt/10 aa; (B) at least 60 nt/20 aa; (C) at least 150 nt/50 aa; (D) at least 300 nt/100 aa. Median values are given for each type of similarity search with GenBank mtDNA genomes colored in orange and RefSeq mtDNA genomes colored in green. The symbols for the 7 different similarity searches run are the same as in Figure 1, and are labeled on the x-axis.
FIGURE 3
FIGURE 3
Numt counts (MD40BG) vs percent identity separated by TFASTX/MD40 only (orange), BLASTN “-task blastn” whole mitochondrial genome only (BLNGT, purple), or both methods (green). (A) Mus musculus; (B) rodent RefSeq genomes used in Figure 1; (C) Homo sapiens; (D) vertebrate RefSeq genomes. The percent identities are compared in each panel for 1) TFASTX (orange): numts found only with coding sequence queries TFASTX protein:translated-DNA with the MD40 scoring matrix; 2) BLASTN (purple): numts found only with whole mtDNA genomes - BLASTN DNA:DNA with “-task blastn” option that includes numts from non-coding portions of the genome; 3) numts found with both methods (TFASTX and BLASTN, green).
FIGURE 4
FIGURE 4
Rodent median numt counts for mtDNA protein genes found using TFASTX with the MD40 scoring matrix. Genes are sorted by length from shortest to longest. GenBank genomes are colored orange and RefSeq colored green with M. muntjak colored in pink and H. sapiens colored in purple. (A) Total counts per gene ordered by gene length in amino acids (aa); (B) Numt counts scaled by average gene length. Lines connect numt counts from mtDNA genes in the same species.
FIGURE 5
FIGURE 5
Mitochondrial nuclear transfer across vertebrate genomes. (A) Total numt counts and (B) total numt length (kb) for the 7 different similarity searches ordered by genome size (GB) from smallest to largest. Genome sizes are listed in the top panel and species are listed on the bottom panel but are consistent for both panels. GenBank mtDNA genomes colored in orange and RefSeq mtDNA genomes colored in green. Scientific names shown in panel B are colored by vertebrate classes shown in panel (A). Numt counts (A) and lengths (B) for each species for the 7 different similarity searches run are shown as follows: 1) plus sign (+): coding sequence queries—BLASTN DNA:DNA with “-task megablast” default option (BLN); 2) (x): coding sequence queries—BLASTN DNA:DNA with “-task blastn” option (BLNT); 3) open circle: coding sequence queries—TFASTX protein:translated-DNA searches with the MD10 protein scoring matrix (MD10); 4) filled circle: coding sequence queries—TFASTX protein:translated-DNA searches with the MD40 protein scoring matrix (MD40); 5) open square: whole mtDNA genome queries—BLASTN DNA:DNA with “-task megablast” default (BLNG); 6) diamond: whole mtDNA genome queries—BLASTN DNA:DNA with “-task blastn” option (BLNGT); 7) square with plus inside: combined numts found only with coding sequence queries TFASTX protein:translated-DNA with the MD40 scoring matrix, only those found with whole mtDNA genomes—BLASTN DNA:DNA that includes numts from non-coding portions of the genome and overlapping numts found with both (MD40BG).

Similar articles

Cited by

References

    1. Antunes A., Ramos M. J. (2005). Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. Genomics 86 (6), 708–717. 10.1016/j.ygeno.2005.08.002 - DOI - PubMed
    1. Bensasson D., Zhang D.-X., Hartl D. L., Hewitt G. M. (2001). Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends Ecol. Evol. 16 (6), 314–321. 10.1016/S0169-5347(01)02151-6 - DOI - PubMed
    1. Benson D. A., Cavanaugh M., Clark K., Karsch-Mizrachi I., Lipman D. J., Ostell J., et al. (2013). GenBank. Nucleic Acids Res. 41, D36–D42. Database issue). 10.1093/nar/gks1195 - DOI - PMC - PubMed
    1. Benton M. J., Donoghue P. C. J. (2006). Paleontological evidence to date the tree of life. Mol. Biol. Evol. 24 (1), 26–53. 10.1093/molbev/msl150 - DOI - PubMed
    1. Blanchard J. L., Schmidt G. W. (1996). Mitochondrial DNA migration events in yeast and humans: Integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns. Mol. Biol. Evol. 13 (3), 537–548. 10.1093/oxfordjournals.molbev.a025614 - DOI - PubMed

LinkOut - more resources