Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Mar 25:2024.03.21.585980.
doi: 10.1101/2024.03.21.585980.

Influenza sequence validation and annotation using VADR

Affiliations

Influenza sequence validation and annotation using VADR

Vincent C Calhoun et al. bioRxiv. .

Update in

Abstract

Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a webserver but not as a standalone tool. VADR is a general sequence validation and annotation software package used by GenBank for Norovirus, Dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use.

Keywords: annotation; influenza.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest None declared.

Figures

Fig. 1:
Fig. 1:. Example potential frameshift detected by VADR but not by FLAN.
VADR alignment of the first 99 nucleotides of CY009539.1 to the CY002079 influenza A segment 1 model reference sequence is shown with a single deletion with respect to the model sequence at model position 37. The polymerase PB2 CDS is encoded by positions 28 to 2307 of CY002079 so the first three nucleotides of the alignment correspond to the start codon. VADR reports a potential frameshift (fsthicft alert) of all nucleotides (positions 10 to 2279) after the deletion. Identical aligned nucleotides between the sequence and the model are indicated by * at the top of the alignment. Some of the information reported in the VADR output file with suffix “.alt” is included below the alignment. FLAN passes CY002079 without a frameshift error or any other errors, possibly because the 9 nucleotide length prior to the frameshift is so short.

References

    1. Prosplign. https://www.ncbi.nlm.nih.gov/sutils/static/prosplign/prosplign.html. Accessed: 2024-02-01.
    1. Who website: Influenza(seasonal). https://www.who.int/news-room/fact-sheets/detail/influenza-(seasonal). Accessed: 2024-02-01.
    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., and Lipman D. J.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25:3389–3402, 1997. - PMC - PubMed
    1. Arita M., Karsch-Mizrachi I., and Cochrane G.. The international nucleotide sequence database collaboration. Nucleic Acids Res., 49:10, 2021. - PMC - PubMed
    1. Bao Y., Bolotov P., Dernovoy D., Kiryutin B., and Tatusova T.. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res., 35:W280–W284, 2007. - PMC - PubMed

Publication types