Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 24;21(1):211.
doi: 10.1186/s12859-020-3537-3.

VADR: validation and annotation of virus sequence submissions to GenBank

Affiliations

VADR: validation and annotation of virus sequence submissions to GenBank

Alejandro A Schäffer et al. BMC Bioinformatics. .

Abstract

Background: GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions.

Results: We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of "alerts" that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank's submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Since March 2020, VADR has also been used to check SARS-CoV-2 sequence submissions. Other viruses with high numbers of submissions will be added incrementally.

Conclusion: VADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.

Keywords: Alignment; Annotation; Virus; ncRNA.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
VADR workflow schematic illustrating uses of the two main VADR scripts. v-build.pl can be used once to build a single model or repeatedly to build a library of models. v-annotate.pl can be used with a model or model library to validate and annotate input sequences

References

    1. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2019;47:D94–D99. doi: 10.1093/nar/gky989. - DOI - PMC - PubMed
    1. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Tatusova T. FLAN: a web Server for Influenza Virus Genome Annotation. Nucleic Acids Res. 2007;35:W280–W284. doi: 10.1093/nar/gkm354. - DOI - PMC - PubMed
    1. Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, et al. Virus Variation Resource - improved response to emergent viral outbreaks. Nucleic Acids Res. 2017;45:D482–D490. doi: 10.1093/nar/gkw1065. - DOI - PMC - PubMed
    1. Waterman MS. Genomic sequence databases. Genomics. 1990;6:700–1. doi: 10.1016/0888-7543(90)90508-R. - DOI - PubMed
    1. Strasser BJ. The Experimenter’s Museum: GenBank, Natural History, and the Moral Economies of Biomedicine. Isis. 2011;102:60–96. doi: 10.1086/658657. - DOI - PubMed

LinkOut - more resources