Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 27;11(1):2539.
doi: 10.1038/s41467-019-12438-5.

Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

Collaborators, Affiliations

Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

Qingbo Wang et al. Nat Commun. .

Erratum in

Abstract

Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.

PubMed Disclaimer

Conflict of interest statement

D.G.M. is a founder with equity in Goldfinch Bio, and has received research support from AbbVie, Astellas, Biogen, BioMarin, Eisai, Merck, Pfizer, and Sanofi-Genzyme. K.J.K. owns stock in Personalis. E.V.M. has received research support in the form of charitable contributions from Charles River Laboratories and Ionis Pharmaceuticals, and has consulted for Deerfield Management. M.I.M.: The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. He has served on advisory panels for Pfizer, NovoNordisk, Zoe Global; has received honoraria from Merck, Pfizer, NovoNordisk, and Eli Lilly; has stock options in Zoe Global and has received research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. As of June 2019, M.I.M. is an employee of Genentech, and holds stock in Roche. R.K.W. has received unrestricted research grants from Takeda Pharmaceutical Company. M.J.D. is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and consultant for Camp4 Therapeutics, Takeda Pharmaceutical, and Biogen. A.O.D.L. has received honoraria from ARUP and Chan Zuckerberg Initiative.

Figures

Fig. 1
Fig. 1
Definition and an example of MNVs, and validation of phasing sensitivity. a Definition and an example of an MNV. In this paper, an MNV is defined as two or more nearby variants existing on the same haplotype in the same individual. b Impact of MNVs in coding regions. The amino acid change caused by an MNV can be different from either of the individual single-nucleotide variants, which creates the potential for missannotation of the functional consequence of variants. c Graphical overview of the analysis of phasing sensitivity and specificity using trio samples from our gnomAD callset. We identified all heterozygous variant pairs that pass quality control (see the Methods section) and compared the phase information assigned by read-based phasing with that of trio-based phasing
Fig. 2
Fig. 2
Functional impact of MNVs. a The number of MNVs in the gnomAD exome data set per MNV category. Of the 1821 rescued nonsense mutations, 1538 are rescued in all individuals that harbor the original nonsense mutation and are used for the analysis in (b) and (c). Gained and rescued nonsense MNVs were further filtered to HC pLoF in (b) and (c). b The number of gained/rescued nonsense mutations per gene, and examples of disease-associated genes with two or more gained/rescued nonsense mutations. c The fraction of each category of MNV found in a set of 3941 constrained genes (top two deciles of constraint)
Fig. 3
Fig. 3
Mutational origins of MNVs. a Three major categories of the mutational origin of MNVs. (Left) A combination of single-nucleotide mutational events. Since the baseline global mutation rate is highly different between transversions and CpG and non-CpG transitions, even a simple combination of single-nucleotide mutational events could result in a highly skewed distribution of MNVs. (Center) One-step mutation caused by error-prone DNA polymerases. For this class of MNVs, since the two mutations occur at once during DNA replication, the allele frequency of the two constituent SNVs of the MNV is more likely to be equal. (Right) Polymerase slippage at repeat junctions. Mutation rates are highly elevated in repeat regions, and are therefore likely to cause various complex patterns of mutations, occasionally resulting in MNVs. b The log-scaled number of MNVs per substitution pattern. c The fraction of one-step MNVs per substitution pattern. Error bars represent standard error of the mean (often smaller than the dot size). d The fraction of MNVs that are in repetitive contexts, and bits representation of sequence contexts. Error bars represent standard error of the mean. Colors in the bars in panels bd represents the predicted major mechanism of MNVs for each substitution pattern
Fig. 4
Fig. 4
Distribution of MNVs across genome. a The number and the fraction of MNVs per origin, per substitution pattern. Gray are the estimated fraction of MNV originating from two single-nucleotide substitution events, brown for polymerase slippage at repeat contexts and purple are the others (presumably mainly replication error by pol-zeta). The colors along the bottom represent the estimated biological origins that dominate MNVs of that specific substitution pattern. b, c MNV density, defined as the number of MNVs per functional annotation divided by the base pair length in the annotation (relative to the whole-genome region), ordered by the methylation level of the functional category. d Estimated fraction of MNVs by different origins, per functional category around the coding region

Comment in

References

    1. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. - PMC - PubMed
    1. Kaplanis, J. et al. Exome-wide assessment of the functional impact and pathogenicity of multinucleotide mutations. Genome Res. gr.239756.118 (2019). - PMC - PubMed
    1. Rosenfeld JA, Malhotra AK, Lencz T. Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. Nucleic Acids Res. 2010;38:6102–6111. - PMC - PubMed
    1. Wei, L. et al. MAC: identifying and correcting annotation for multi-nucleotide variations. BMC Genomics16, 569 (2015). - PMC - PubMed
    1. Lai Z, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 2016;44:e108. - PMC - PubMed

Publication types