Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 1;16(1):569.
doi: 10.1186/s12864-015-1779-7.

MAC: identifying and correcting annotation for multi-nucleotide variations

Affiliations

MAC: identifying and correcting annotation for multi-nucleotide variations

Lei Wei et al. BMC Genomics. .

Abstract

Background: Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome.

Results: Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls.

Conclusions: MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Amino acid predictions for two neighboring SNVs scenarios. (A1) Two consecutive SNVs in gene TP53 codon 285. The fact the two SNVs are present on the same read suggests they are originated from the same chromosome. (A2) Incorrect annotation based on prediction of individual SNVs. The first and second SNVs were predicted to introduce E285V and E285Q, respectively. (A3) The correct amino acid change based on MNV is E285L. (B1) Two SNVs are located in gene OR6Y1 codon 252 but on different reads, suggesting they originated from separate chromosomes. (B2) The two SNVs in B1 were correctly predicted to introduce V252V and V252I based on individual SNVs. The sequencing reads are displayed in IGV viewer [14]
Fig. 2
Fig. 2
Depiction of MAC workflow (left panel) and a MAC test run (right panel). Left: (A1) A list of SNVs identified by any variant caller; (A2) Reads extracted from the BAM file for all SNVs to identify Block of Mutations; (A3) Identify Block of Mutations within Codon within each subgraph using an annotation tool. Right: MAC test run using 3024 input SNVs from a breast cancer data set identified 56 BMs and 4 BMCs containing 8 SNVs. After re-annotation, 7 of 8 SNVs were classified as MNVs with different amino acid changes than the original SNV-based annotation

References

    1. Ding L, Wendl MC, McMichael JF, Raphael BJ. Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet. 2014;15:556–70. doi: 10.1038/nrg3767. - DOI - PMC - PubMed
    1. Rosenfeld JA, Malhotra AK, Lencz T. Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. Nucleic Acids Res. 2010;38(18):6102–11. doi: 10.1093/nar/gkq408. - DOI - PMC - PubMed
    1. Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463(7278):184–90. doi: 10.1038/nature08629. - DOI - PMC - PubMed
    1. Nedelko T, Arlt VM, Phillips DH, Hollstein M. TP53 mutation signature supports involvement of aristolochic acid in the aetiology of endemic nephropathy-associated tumours. Int J Cancer. 2009;124(4):987–90. doi: 10.1002/ijc.24006. - DOI - PubMed
    1. Mace K, Aguilar F, Wang JS, Vautravers P, Gomez-Lechon M, Gonzalez FJ, Groopman J, Harris CC, Pfeifer AM. Aflatoxin B1-induced DNA adduct formation and p53 mutations in CYP450-expressing human liver cell lines. Carcinogenesis. 1997;18(7):1291–7. doi: 10.1093/carcin/18.7.1291. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources