Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1999 Feb;15(2):106-10.
doi: 10.1093/bioinformatics/15.2.106.

Establishing a method of vector contamination identification in database sequences

Affiliations

Establishing a method of vector contamination identification in database sequences

G A Seluja et al. Bioinformatics. 1999 Feb.

Abstract

Motivation: The nucleotide sequence databases are invaluable tools both for the private and the academic research communities, from the retrieval of sequences to homology searching. Several issues related to data quality, such as the existence of sequencing artifacts and errors, are facing the databases. We investigated a major source of these errors, i.e. the presence of vector-contaminated sequences.

Results: Using a panel of 180 vector polylinker sequences, we found 0.36% or 3029 vector-matching sequences in GenBank Release 95-96, with an average vector-matching length of 72 nucleotides. The number of vector-contaminated sequences has been growing with the database; however, the percent contamination has remained approximately constant at an average of 0.28% from 1982 to 1996.

Availability: Access to the database of vector polylinker sequences via sequence similarity searching is available at http://seqsim.ncgr.org/vector/

Contact: gas@molinfo.com

PubMed Disclaimer

Publication types

LinkOut - more resources