Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May;3(5):e98.
doi: 10.1371/journal.pcbi.0030098. Epub 2007 Apr 20.

Comprehensive DNA signature discovery and validation

Affiliations

Comprehensive DNA signature discovery and validation

Adam M Phillippy et al. PLoS Comput Biol. 2007 May.

Abstract

DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive system for the rapid identification of signatures in the genomes of bacteria and viruses. With the availability of hundreds of complete bacterial and viral genome sequences, it is now possible to use computational methods to identify signature sequences in all of these species, and to use these signatures as the basis for diagnostic assays to detect and genotype microbes in both environmental and clinical samples. The success of such assays critically depends on the methods used to identify signatures that properly differentiate between the target genomes and the sample background. We have used Insignia to compute accurate signatures for most bacterial genomes and made them available through our Web site. A sample of these signatures has been successfully tested on a set of 46 Vibrio cholerae strains, and the results indicate that the signatures are highly sensitive for detection as well as specific for discrimination between these strains and their near relatives. Our approach, whereby the entire genomic complement of organisms are compared to identify probe targets, is a promising method for diagnostic assay development, and it provides assay designers with the flexibility to choose probes from the most relevant genes or genomic regions. The Insignia system is freely accessible via a Web interface and has been released as open source software at: http://insignia.cbcb.umd.edu.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Inclusive TaqMan Assay Displaying Increased Fluorescence due to Target Amplification for All 46 V. cholerae Strains Tested, and No Fluorescent Activity among the E. coli Negative Controls
Relative florescence intensity for 40 PCR cycles is shown.
Figure 2
Figure 2. Exclusive TaqMan Assay Displaying Increased Fluorescent Activity for the Reference Strain of V. cholerae and No Fluorescent Activity among the 23 Non-Cholera Strains
Relative florescence intensity for 50 PCR cycles is shown.
Figure 3
Figure 3. TaqMan Validation Results for the 50 Assay Designs Tested on 46 V. cholerae, 22 Near Neighbors, and One E. coli Control
Organisms are grouped vertically, and assays are sorted horizontally by effectiveness. Each colored box represents the Ct value for one of the 3,450 validation experiments. For example, assays 1–5 show strong amplification for all V. cholerae strains and heavily delayed or failed amplification for all other organisms.
Figure 4
Figure 4. A Match Cover (Mtb) Constructed from the Exact Matches between a Target (t) and Background (b) Genome
Mtb intervals (red boxes) represent regions of the target with a contiguous match to the background (gray boxes).
Figure 5
Figure 5. Shared k-mers (Red Lines) Obtained from an Intersection (Ir) of Three Match Covers Mrs, Mrt, Mru
Ir intervals (red boxes) represent regions of the reference r shared with all other target genomes s, t, u as derived from the match covers between the reference and each target (gray boxes). k-mers not contained by Ir are not shared by all targets (dotted gray lines).
Figure 6
Figure 6. Unique k-mers (Red Lines) Obtained from a Union (Ur) of Three Match Covers Mra, Mrb, Mrc
Ur intervals (red boxes) represent regions of the reference r matching some background genome a, b, c as derived from the match covers between the reference and each background (gray boxes). k-mers contained by Ur match the background and are not unique (dotted gray lines).

References

    1. Willse A, Straub TM, Wunschel SC, Small JA, Call DR, et al. Quantitative oligonucleotide microarray fingerprinting of Salmonella enterica isolates. Nucleic Acids Res. 2004;32:1848–1856. - PMC - PubMed
    1. Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, et al. Microarray-based detection and genotyping of viral pathogens. Proc Natl Acad Sci U S A. 2002;99:15687–15692. - PMC - PubMed
    1. Volokhov D, Pomerantsev A, Kivovich V, Rasooly A, Chizhikov V. Identification of Bacillus anthracis by multiprobe microarray hybridization. Diagn Microbiol Infect Dis. 2004;49:163–171. - PubMed
    1. Slezak T, Kuczmarski T, Ott L, Torres C, Medeiros D, et al. Comparative genomics tools applied to bioterrorism defense. Brief Bioinform. 2003;4:133–149. - PubMed
    1. O'Connell KP, Bucher JR, Anderson PE, Cao CJ, Khan AS, et al. Real-time fluorogenic reverse transcription-PCR assays for detection of bacteriophage MS2. Appl Environ Microbiol. 2006;72:478–483. - PMC - PubMed

Publication types