Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 6:13:1177932219873884.
doi: 10.1177/1177932219873884. eCollection 2019.

MitoIMP: A Computational Framework for Imputation of Missing Data in Low-Coverage Human Mitochondrial Genome

Affiliations

MitoIMP: A Computational Framework for Imputation of Missing Data in Low-Coverage Human Mitochondrial Genome

Koji Ishiya et al. Bioinform Biol Insights. .

Abstract

The incompleteness of partial human mitochondrial genome sequences makes it difficult to perform relevant comparisons among multiple resources. To deal with this issue, we propose a computational framework for deducing missing nucleotides in the human mitochondrial genome. We applied it to worldwide mitochondrial haplogroup lineages and assessed its performance. Our approach can deduce the missing nucleotides with a precision of 0.99 or higher in most human mitochondrial DNA lineages. Furthermore, although low-coverage mitochondrial genome sequences often lead to a blurred relationship in the multidimensional scaling analysis, our approach can correct this positional arrangement according to the corresponding mitochondrial DNA lineages. Therefore, our framework will provide a practical solution to compensate for the lack of genome coverage in partial and fragmented human mitochondrial genome sequences. In this study, we developed an open-source computer program, MitoIMP, implementing our imputation procedure. MitoIMP is freely available from https://github.com/omics-tools/mitoimp.

Keywords: Missing data; ancient DNA; high-throughput sequencing; low-coverage; mitochondrial DNA.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests:The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Flowchart of the imputation procedure. This flowchart shows the imputation process in the MitoIMP program, implementing our approach. The rounded rectangles indicate the beginning and end of the procedure. The rectangles with a wavy base indicate the input and output files. The rectangular boxes represent processing or data manipulation. The MAFFT program is used to perform multiple alignments for the input sequence.
Figure 2.
Figure 2.
The assessment of imputation procedures for mitochondrial haplogroup lineages. The vertical axis shows the precision in the assessment of the simulated imputation procedures. The horizontal axis indicates the percentage of missing nucleotides (10%-90%) in the partial mitochondrial genome sequences. Error bars indicate the standard error of the mean (SEM). The results in the case of the “ALL” panel including all macro-haplogroup lineages are indicated by the blue line, and those of the “Haplogroup” panel consisting of the same macro-haplogroup lineages are indicated by the orange line.
Figure 3.
Figure 3.
The assessment of imputation procedures across the human mitochondrial genome. The scatter plot inside the circle shows the precision of the imputed nucleotides in 500 imputation trials, using worldwide haplogroup lineages. Protein- and RNA-coding regions are shown in gray and the noncoding region (D-loop) is shown in green. The abbreviations of the regions over 100 bp (base pairs) are written in white letters. The numerical value of the outer frame indicates the genomic position in the mitochondrial genome at intervals of 1000 bp. The lines inside the circle are graduated by 0.1 intervals, from 0.5 to 1.00.
Figure 4.
Figure 4.
The relative relationships among individuals before and after the imputation. (A) The left and right figures show the results of the MDS analysis before and after the imputation procedure, respectively. The color scheme is according to macro-haplogroup lineages—B5 (blue), C7 (green), D4 (pink), F1 (light blue), M13 (cyan), M7 (red), M72 (light green), M74 (yellow), M8 (orange), N.A. (dark gray). (B) These figures are heat maps, based on the allele-sharing distance matrix among the empirical human mitochondrial genome sequences. Color keys of the distance values are shown on the upper left of each heatmap. The left figure shows the heatmap based on the allele-sharing distance before the imputation. The right figure shows the heatmap based on the allele-sharing distance after the imputation. Clusters of macro-haplogroups B5 and M7 are outlined by the white dashed lines.

References

    1. Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987;325:31-36. - PubMed
    1. Mishmar D, Ruiz-Pesini E, Golik P, et al. Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A. 2003;100:171-176. - PMC - PubMed
    1. Macaulay V, Hill C, Achilli A, et al. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005;308:1034-1036. - PubMed
    1. Torroni A, Achilli A, Macaulay V, Richards M, Bandelt H. Harvesting the fruit of the human mtDNA tree. Trends Genet. 2006;22:339-345. - PubMed
    1. Underhill PA, Kivisild T. Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu Rev Genet. 2007;41:539-564. - PubMed

LinkOut - more resources