Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;15 Suppl 3(Suppl 3):S2.
doi: 10.1186/1471-2164-15-S3-S2. Epub 2014 May 6.

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

Maria Angela Diroma et al. BMC Genomics. 2014.

Abstract

Background: Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.

Results: A previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.

Conclusions: To the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
1000 Genomes mitochondrial indels distribution. The analysis of 1242 Illumina samples allowed to identify 149 deleted and 66 inserted mitochondrial positions, mostly heteroplasmic. The ratio between the number of homoplasmic and heteroplasmic indels and the length of mitochondrial loci is reported (normalized indels). The distribution of insertions and deletions within the mitochondrial genome shows peaks of indels ratio within the D-loop, while a lower number is present within coding regions.
Figure 2
Figure 2
1000 Genomes mitochondrial mismatches distribution. The ratio between the number of mismatches and the length of mitochondrial loci (normalized mismatches) is reported. Variant distributions are rather homogeneous among mitochondrial loci, without significant dissimilarities between the two types, with high peaks into non-coding regions.
Figure 3
Figure 3
Enrichment of heteroplasmic fractions within the dataset. Variants frequency was estimated for eleven ranges of heteroplasmic fractions (HF). The ratio between the mean number of variants for each cluster and the mean number of variants in each sample highlights that homoplasmic variants represent the largest slice with respect to the whole set of alleles in an individual. With respect to the degree of heteroplasmy, there is a substantial preponderance of homoplasmic variants (HF = 1.00), quasi-homoplasmies (0.90-0.99) and low-level heteroplasmies (0.01-0.10).
Figure 4
Figure 4
Distribution of homoplasmic and heteroplasmic variants in LCL and blood subsets. Frequencies of homoplasmic and heteroplasmic variants were normalized to the length of mitochondrial loci, in order to eliminate length biases. Variants distribution across mitochondrial loci shows that generally heteroplasmic and homoplasmic mutations occur in the same loci, and this is verified in both datasets. A statistical relevant difference between the distributions of variants within the two datasets was found for heteroplasmic (p < 0.01), but not for homoplasmic variants.
Figure 5
Figure 5
Variability profile of blood and LCL genomes. Variability values estimated on nearly 10,000 mitochondrial nucleotide multi-aligned sequences, available in HmtDB web site [33] are reported versus the variant positions observed in both blood and LCL samples. The trend is similar for both datasets: almost all of the variant positions display a very low variability (<0.20). Positions showing high variability, i.e. those which were common in the genomes within HmtDB, were few and represented the 25% of total shared variants between the two datasets, while the majority of non-shared variants (about 98%) displayed low variability.

References

    1. He Y, Wu J, Dressman DC, Iacobuzio-Donahue C, Markowitz SD, Velculescu VE, Diaz LA, Kinzler KW, Vogelstein B, Papadopoulos N. Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature. 2010;15(7288):610–614. - PMC - PubMed
    1. Tang S, Huang T. Characterization of mitochondrial DNA heteroplasmy using a parallel sequencing system. Biotechniques. 2010;15(4):287–296. - PubMed
    1. Zaragoza MV, Fass J, Diegoli M, Lin D, Arbustini E. Mitochondrial DNA variant discovery and evaluation in human Cardiomyopathies through next-generation sequencing. PLoS One. 2010;15(8):e12295. - PMC - PubMed
    1. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;15(11):745–755. - PubMed
    1. Pesole G, Allen JF, Lane N, Martin W, Rand DM, Schatz G, Saccone C. The neglected genome. EMBO Rep. 2012;15(6):473–474. - PMC - PubMed

Publication types