Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;9(3):R56.
doi: 10.1186/gb-2008-9-3-r56. Epub 2008 Mar 18.

DNA signatures for detecting genetic engineering in bacteria

Affiliations

DNA signatures for detecting genetic engineering in bacteria

Jonathan E Allen et al. Genome Biol. 2008.

Abstract

Using newly designed computational tools we show that, despite substantial shared sequences between natural plasmids and artificial vector sequences, a robust set of DNA oligomers can be identified that can differentiate artificial vector sequences from all available background viral and bacterial genomes and natural plasmids. We predict that these tools can achieve very high sensitivity and specificity rates for detecting new unsequenced vectors in microarray-based bioassays. Such DNA signatures could be important in detecting genetically engineered bacteria in environmental samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Percentage of k-mers that are candidate signatures. The red line plots the percentage of candidate vector signatures as a function of k (100% for a given k would mean all observed k-mers are signatures). The blue and green lines plot the percentage of artificial vector derived k-mers shared exclusively with natural plasmids and chromosomes, respectively.
Figure 2
Figure 2
Signature sets. Plots of the number of k-mer sets containing signatures for k = 15 to 100.
Figure 3
Figure 3
Example artificial vector sequence mapped to two natural plasmids. The vector sequence is shown in the middle (Phagemid cloning vector pTZ19R), which shares sequence with both the E. coli plasmid pCA4, and the Erwinia amylovora plasmid pEA2.8. Lines connecting the three sequences mark the beginning of exact matches between the artificial sequence and the two respective plasmids. The number next to each line is the length of exact match (for matches of 100 or more bases). Functional annotation for the artificial vector sequence is given above the sequence (RS denotes recombination site). Position 614 marks the starting point of the shortest signature found (k = 23). (Not drawn to scale.)
Figure 4
Figure 4
Artificial vector sequence detection. The percentage of correctly rejected background sequences (y-axis) versus correctly accepted artificial vector sequences (x-axis) using bit score thresholds. Each point is the percentage of background sequences (y-axis) with bit scores below a fixed bit score threshold versus the percentage of artificial vector sequences (x-axis) above the same bit score threshold. We examined 20 bit-score threshold values. Only the points with a rejection/acceptance percentage above 85% are shown. The six different signature sets are shown in the legend and are described by their k-mer size (30 and 60) and the signature set origin (large, small and MCS-only). The large and small sets are k-mer derived signature sets and MCS-only are signature sets derived exclusively from the multiple cloning site regions.
Figure 5
Figure 5
Artificial vector sequence detection with a modified signature set. The percentage of correctly rejected background sequences (y-axis) versus correctly accepted artificial vector sequences (x-axis) using bit score thresholds after filtering out signatures with high bit score matches to the background sequence.
Figure 6
Figure 6
Signature set percentages for select functional annotation categories. Functional categories are protein coding genes (CDS), multiple cloning sites (MCS), no annotation and recombination sites.
Figure 7
Figure 7
Vector/plasmid shared k-mer sets for select functional annotation categories. Percentage of shared k-mer sets is shown for different k-mer sizes.
Figure 8
Figure 8
Hash tables and k-mer set clusters. The left panel shows schematic of an example hash table (Hash table 1). Each key is a k-mer (k-mer-1, k-mer-2,..., k-mer-7) with an entry storing a list of numeric identifiers for the sequences with the k-mer substring. The upper right panel shows the second hash table (Hash table 2), where each key is the set of k-mers common among the set of vectors specified by the key. The bottom right panel shows the graph representation of the four k-mer sets (numbered 1 to 4) with k-mer sets as nodes and labeled edges between nodes representing shared vectors between nodes.
Figure 9
Figure 9
k-mer set cluster. Graph of cluster 1 from Table 2. Each node shows the number of k-mers in the set (left number), the number of artificial vectors sharing the k-mer substrings (right number) and the functional annotation. Edges denote common vectors between two nodes. Abbreviations are as follows: DHPS, dihydropteroate synthase; STRA, streptomycin resistance; Kanamycin/Neomycin, Kanamycin/Neomycin resistance; Recombsite, recombination site; Transterm, transcription termination.

References

    1. Verma R, Boleti E, George AJT. Antibody engineering: comparison of bacterial, yeast, insect and mammalian expression systems. J Immunol Methods. 1998;216:165–181. doi: 10.1016/S0022-1759(98)00077-5. - DOI - PubMed
    1. Benner SA, Sismour AM. Synthetic biology. Nat Rev Genet. 2005;6:533–543. doi: 10.1038/nrg1637. - DOI - PMC - PubMed
    1. Smith HO, Hutchinson CA 3rd, Pfannkoch C, Venter JC. Generating a synthetic genome by whole genome assembly:phiX174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci USA. 2003;100:15440–15445. doi: 10.1073/pnas.2237126100. - DOI - PMC - PubMed
    1. Sturino JM, Klaenhammer TR. Engineered bacteriophage-defence systems in bioprocessing. Nat Rev Microbiol. 2006;4:395–404. doi: 10.1038/nrmicro1393. - DOI - PubMed
    1. Khosla C, Keasling JD. Metabolic engineering for drug discovery and development. Nat Rev Drug Discov. 2003;2:1019–1025. doi: 10.1038/nrd1256. - DOI - PubMed

Publication types

LinkOut - more resources