Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul 26;33(13):e115.
doi: 10.1093/nar/gni110.

Genome-wide selection of unique and valid oligonucleotides

Affiliations

Genome-wide selection of unique and valid oligonucleotides

Heikki Hyyrö et al. Nucleic Acids Res. .

Abstract

Functional genomics methods are used to investigate the huge amount of information contained in genomes. Numerous experimental methods rely on the use of oligo- or polynucleotides. Nucleotide strand hybridization forms the underlying principle for these methods. For all these techniques, the probes should be unique for analyzed genes. In addition to being unique for the studied genes, the probes should fulfill a large number of criteria to be usable and valid. The criteria include for example, avoidance of self-annealing, suitable melting temperature and nucleotide composition. We developed a method for searching unique and valid oligonucleotides or probes for genes so that there is not even a similar (approximate) occurrence in any other location of the whole genome. By using probe size 25, we analyzed 17 complete genomes representing a wide range of both prokaryotic and eukaryotic organisms. More than 92% of all the genes in the investigated genomes contained valid oligonucleotides. Extensive statistical tests were performed to characterize the properties of unique and valid oligonucleotides. Unique and valid oligonucleotides were relatively evenly distributed in genes except for the beginning and end, which were somewhat overrepresented. The flanking regions in eukaryotes were clearly underrepresented among suitable oligonucleotides. In addition to distributions within genes, the effects on codon and amino acid usage were also studied.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Principle of the oligonucleotide analysis program. The oligos are searched by sliding a window of 25 positions along the analyzed sequence. The 25mer is partitioned to three 8mers and a single nucleotide. 1-neighborhoods (difference of one character allowed) are constructed for each piece and compared to the precomputed index of the locations of all 8mers in the investigated data (coding regions or complete genome). Two-phase filtering program and fast bit-parallel approximate string matching algorithm are used to identify the uniqueness of the 25mers.
Figure 2
Figure 2
Effects of edit distance and the use of criteria on the number of unique and valid oligonucleotides in A.thaliana data. The analysis was done for unique (black) and valid (red) oligos on coding region as well as for unique (green) and valid (blue) oligos in the whole genome.
Figure 3
Figure 3
Nucleotide distribution within oligonucleotides. The ratio of nucleotides in (A) unique oligos in coding region and (B) valid oligos in genome. Z-values for the distribution of nucleotides in (C) unique oligos in coding region and (D) valid oligos in genome. The difference between the nucleotide usage and (E) unique oligos in coding regions and (F) all oligos in genome data.
Figure 4
Figure 4
Distribution of nucleotide numbers in unique oligonucleotides in coding region (panels on left) and in valid oligos in genome data (panels to the right).
Figure 5
Figure 5
Distribution of oligonucleotides in different sections of genes for (A) unique oligos in CDS regions and (B) valid oligos on genome. The ratio of (C) unique versus invalid oligos in coding regions and (D) valid versus invalid oligos in genome data.
Figure 6
Figure 6
Distribution of codons in oligonucleotides. Data is shown only for valid oligonucleotides in genome data. Note that yeast, C.elegans and A.thaliana data contain also the flanking 5′ and 3′ regions.
Figure 7
Figure 7
Distribution of codons in different sections of genes. The figures (AD) are for valid oligos in genome data.
Figure 7
Figure 7
Distribution of codons in different sections of genes. The figures (AD) are for valid oligos in genome data.
Figure 7
Figure 7
Distribution of codons in different sections of genes. The figures (AD) are for valid oligos in genome data.
Figure 7
Figure 7
Distribution of codons in different sections of genes. The figures (AD) are for valid oligos in genome data.
Figure 8
Figure 8
Distribution of amino acids within the valid oligonucleotides in genome data.
Figure 9
Figure 9
Distribution of the amino acids within eight sections of proteins. Data is for valid oligonucleotides in genome data.
Figure 9
Figure 9
Distribution of the amino acids within eight sections of proteins. Data is for valid oligonucleotides in genome data.

Similar articles

Cited by

References

    1. Rychlik W., Rhoads R.E. A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res. 1989;17:8543–8551. - PMC - PubMed
    1. Hillier L., Green P. OSP: a computer program for choosing PCR and DNA sequencing primers. PCR Methods Appl. 1991;1:124–128. - PubMed
    1. Cutichia A., Arnold J., Timberlake W.E. PCAP: probe choice and analysis package—a set of programs to aid in choosing synthetic oligomers for contig mapping. Comput. Appl. Biosci. 1993;9:201–203. - PubMed
    1. Li P., Kupfer K.C., Davies C.J., Burbee D., Evans G.A., Garner H.R. PRIMO: a primer design program that applies base quality statistics for automated large-scale DNA sequencing. Genomics. 1997;40:476–485. - PubMed
    1. Mecklenburg M. Design of high-annealing-temperature primers for PCR and development of a versatile low-copy-number amplification protocol. Adv. Mol. Cell Biol. 1997;15B:473–490.

Publication types