Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 17;14(11):2092.
doi: 10.3390/genes14112092.

Comprehensive Identification of Mitochondrial Pseudogenes (NUMTs) in the Human Telomere-to-Telomere Reference Genome

Affiliations

Comprehensive Identification of Mitochondrial Pseudogenes (NUMTs) in the Human Telomere-to-Telomere Reference Genome

Yichen Tao et al. Genes (Basel). .

Abstract

Practices related to mitochondrial research have long been hindered by the presence of mitochondrial pseudogenes within the nuclear genome (NUMTs). Even though partially assembled human reference genomes like hg38 have included NUMTs compilation, the exhaustive NUMTs within the only complete reference genome (T2T-CHR13) remain unknown. Here, we comprehensively identified the fixed NUMTs within the reference genome using human pan-mitogenome (HPMT) from GeneBank. The inclusion of HPMT serves the purpose of establishing an authentic mitochondrial DNA (mtDNA) mutational spectrum for the identification of NUMTs, distinguishing it from the polymorphic variations found in NUMTs. Using HPMT, we identified approximately 10% of additional NUMTs in three human reference genomes under stricter thresholds. And we also observed an approximate 6% increase in NUMTs in T2T-CHR13 compared to hg38, including NUMTs on the short arms of chromosomes 13, 14, and 15 that were not assembled previously. Furthermore, alignments based on 20-mer from mtDNA suggested the presence of more mtDNA-like short segments within the nuclear genome, which should be avoided for short amplicon or cell free mtDNA detection. Finally, through the assay of transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) on cell lines before and after mtDNA elimination, we concluded that NUMTs have a minimal impact on bulk ATAC-seq, even though 16% of sequencing data originated from mtDNA.

Keywords: ATAC-seq; NUMTs; human reference genome; mitochondrial DNA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
NUMTs in three human reference genomes (hg19, hg38, T2T-CHR13). (A) The mitochondrial haplogroups and their proportions in the pan-mitogenome database. The colors represent major haplogroups, and the smaller squares within the same color represent the sub-haplotypes within the major haplogroups. (B) The number and length of NUMTs in the three reference genomes. (C,D) The length (log2) distribution of NUMTs. The colors represent different reference genomes. The points above the box in (D) represent outliers in the box plot. (E) Histogram of NUMTs length distribution (25 bp~14,855 bp).
Figure 2
Figure 2
Comparison with the known NUMT compilations (hg19). (A) The number and length (log2) of NUMTs identified in this study compared to known NUMTs. (B) The distribution of NUMTs length (log2) in different NUMT compilations. (C) Overlap between the NUMTs compilation identified in this study and in three previous studies. The union of two NUMT compilations constitutes 100% [7,8,35].
Figure 3
Figure 3
NUMTs on different chromosomes. (A,B) The distribution of NUMTs on different chromosomes ((A) hg38; (B) T2T-CHR13). The colors of the bands indicate the NUMTs’ lengths. The red boxes highlight a gap region on the Y chromosome and the short arms of chromosomes chr13, chr14, and chr15. The total length (C) and number (D) of NUMTs exhibit a linear relationship with chromosome length. Different colors indicate the reference genomes.
Figure 4
Figure 4
The breakpoints of NUMTs. The Circos plot illustrates the distribution of NUMTs on two reference genomes, hg38 (A) and T2T-CHR13 (B). The left half-circle in the Circos plot represents the coordinates of the mitochondrial genome, while the right half-circle represents the coordinates of nuclear chromosomes 1–22 and the XY. The connecting lines indicate the breakpoint of NUMTs, and the colors of the lines represent different chromosome origins. (C) The coverage of NUMTs mapped to mtDNA. The x-axis represents the positions corresponding to mtDNA (1~16,569), while the y-axis represents the coverage. Regions with low coverage suggest the presence of breakpoints. The gray and green shading, respectively, highlight breakpoints in the mtDNA control region and tRNA region.
Figure 5
Figure 5
The quantity of 20-mer segments mapped to the nuclear genome. The x-axis illustrates the length of mapped positions (mapped by 20-mers). The left y-axis corresponds to the number of 20-mer segments, while the right y-axis corresponds to the number of NUMTs. The dotted curve illustrates a decreasing trend in the quantity of 20-mers (on the left y-axis) as the mapped position length increases. Conversely, the histogram shows an increasing trend in the number of NUMTs as the mapped position length increases. Additionally, it is evident from Figure 1E that shorter NUMTs (<70 bp) are less abundant compared to longer NUMTs (>70 bp). This suggests the potential presence of unnoticed short NUMTs. Additionally, the turning point of the dot curve occurs in the range of 25~30 bp, indicating that we should set the cutoff for the shortest NUMTs in this range (the black dashed line, 28 bp).
Figure 6
Figure 6
Comparison of ATAC-seq before and after mtDNA reduction. (A) Coverage on mtDNA, with red representing mtDNA-WT and blue representing mtDNA-reduction, with three biological replicates for each treatment. (B) The proportion of reads aligned to mtDNA in the ATAC-seq sequencing files. (C) The peak (top) is consistent in shape between mtDNA-WT and mtDNA-reduction, and there is no obvious accumulation of reads in the heatmap (bottom). This suggests that the impact of NUMTs-Blacklist regions in the nuclear genome on ATAC-seq is minimal.

Similar articles

Cited by

References

    1. Woischnik M., Moraes C.T. Pattern of organization of human mitochondrial pseudogenes in the nuclear genome. Genome Res. 2002;12:885–893. doi: 10.1101/gr.227202. - DOI - PMC - PubMed
    1. Lopez J.V., Yuhki N., Masuda R., Modi W., O’Brien S.J. Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. J. Mol. Evol. 1994;39:174–190. doi: 10.1007/BF00163806. - DOI - PubMed
    1. Bravi C.M., Parson W., Bandelt H.-J. Numts Revisited. In: Bandelt H.-J., Macaulay V., Richards M., editors. Human Mitochondrial DNA and the Evolution of Homo Sapiens. Springer; Berlin/Heidelberg, Germany: 2006. pp. 31–46. - DOI
    1. Woerner A.E., Cihlar J.C., Smart U., Budowle B. Numt identification and removal with RtN! Bioinformatics. 2020;36:5115–5116. doi: 10.1093/bioinformatics/btaa642. - DOI - PubMed
    1. Wei W., Schon K.R., Elgar G., Orioli A., Tanguy M., Giess A., Tischkowitz M., Caulfield M.J., Chinnery P.F. Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes. Nature. 2022;611:105–114. doi: 10.1038/s41586-022-05288-7. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources