ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Kai Wang¹, Mingyao Li, Hakon Hakonarson

Affiliations

PMID: 20601685
PMCID: PMC2938201
DOI: 10.1093/nar/gkq603

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Kai Wang et al. Nucleic Acids Res. 2010 Sep.

. 2010 Sep;38(16):e164.

doi: 10.1093/nar/gkq603. Epub 2010 Jul 3.

Authors

Kai Wang¹, Mingyao Li, Hakon Hakonarson

Affiliation

¹ Center for Applied Genomics, Children's Hospital of Philadelphia, PA 19104, USA. kai@openbioinformatics.org

PMID: 20601685
PMCID: PMC2938201
DOI: 10.1093/nar/gkq603

Abstract

High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

PubMed Disclaimer

Figures

**Figure 1.**
Identification of genes responsible for Miller syndrome using a synthetic data set. The input data set includes all SNVs and indels in subject NA18107 generated by Illumina, as well as two variants known to cause Miller syndrome. The variants reduction method can be implemented by an automation script (auto_annovar.pl) in the ANNOVAR package.

See this image and copyright information in PMC

References

1. Trapnell C, Salzberg SL. How to map billions of short reads onto genomes. Nat. Biotechnol. 2009;27:455–457. - PMC - PubMed
1. Dalca AV, Brudno M. Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 2010;11:3–14. - PubMed
1. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010;42:30–35. - PMC - PubMed
1. Gamazon ER, Zhang W, Konkashbaev A, Duan S, Kistner EO, Nicolae DL, Dolan ME, Cox NJ. SCAN: SNP and copy number annotation. Bioinformatics. 2010;26:259–262. - PMC - PubMed
1. Li S, Ma L, Li H, Vang S, Hu Y, Bolund L, Wang J. Snap: an integrated SNP annotation platform. Nucleic Acids Res. 2007;35:D707–D710. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Affiliation

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases