Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1997 May;7(5):541-50.
doi: 10.1101/gr.7.5.541.

Sequence mapping by electronic PCR

Affiliations

Sequence mapping by electronic PCR

Gregory D Schuler. Genome Res. 1997 May.

Abstract

The highly specific and sensitive PCR provides the basis for sequence-tagged sites (STSs), unique landmarks that have been used widely in the construction of genetic and physical maps of the human genome. Electronic PCR (e-PCR) refers to the process of recovering these unique sites in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that they could plausibly prime the amplification of a PCR product of the correct molecular weight. A software tool was developed to provide an efficient implementation of this search strategy and allow the sort of en masse searching that is required for modern genome analysis. Some sample searches were performed to demonstrate a number of factors that can affect the likelihood of obtaining a match. Analysis of one large sequence database record revealed the presence of several microsatellite and gene-based markers and allowed the exact base-pair distances among them to be calculated. This example provides a demonstration of how e-PCR can be used to integrate the growing body of genomic sequence data with existing maps, reveal relationships among markers that existed previously on different maps, and correlate genetic distances with physical distances.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PCR primer sequences from a typical dbSTS record and their relationship to a query sequence that might be searched by e-PCR. (a) A few selected fields are shown from dbSTS record 16273 (GenBank accession no. G09892), including various names and identifiers, the sequences of the forward and reverse primers (both in 5′ → 3′ orientation), the size of the PCR product, and the sequence of the amplicon and flanking regions. (b) For a query sequence that is the same sense as the sequence of the dbSTS record, a successful match will include the forward primer followed by the inverse (i.e., reverse-compliment) of the reverse primer. On the other hand, if the query sequence is of the opposite sense (imagine the lower strand reversed), it will be the reverse primer followed by the inverse of the forward primer.
Figure 2
Figure 2
BLAST search with a microsatellite-containing query sequence. The BLASTN program was used to search the dbSTS database with the sequence corresponding to GenBank entry L33477 (Br-cadherin) as the query sequence using a match score (M parameter) of 1 and a mismatch score (N parameter) of −2. The query sequence was not filtered for low-complexity sequences. (a) The first 20 sequences listed on the resulting “hit list” are shown, sorted by statistical significance. Altogether, >8000 sequence matches were found (by default, only the first 500 are shown, but the complete list can be obtained by setting the V parameter to a very large number). The best match was observed against the sequence corresponding to GenBank entry Z16831, which contains the Généthon marker D5S411. When low-complexity filtering is used, the problem is reduced dramatically, but 10 false positives remain so manual inspection of the results is still required. (b) The sequence alignment generated by BLAST for the sequence corresponding to GenBank entry Z24204, which was the second-best hit reported, contains the Généthon marker D15S206. The alignment includes only the (CA)n microsatellite repeats.
Figure 2
Figure 2
BLAST search with a microsatellite-containing query sequence. The BLASTN program was used to search the dbSTS database with the sequence corresponding to GenBank entry L33477 (Br-cadherin) as the query sequence using a match score (M parameter) of 1 and a mismatch score (N parameter) of −2. The query sequence was not filtered for low-complexity sequences. (a) The first 20 sequences listed on the resulting “hit list” are shown, sorted by statistical significance. Altogether, >8000 sequence matches were found (by default, only the first 500 are shown, but the complete list can be obtained by setting the V parameter to a very large number). The best match was observed against the sequence corresponding to GenBank entry Z16831, which contains the Généthon marker D5S411. When low-complexity filtering is used, the problem is reduced dramatically, but 10 false positives remain so manual inspection of the results is still required. (b) The sequence alignment generated by BLAST for the sequence corresponding to GenBank entry Z24204, which was the second-best hit reported, contains the Généthon marker D15S206. The alignment includes only the (CA)n microsatellite repeats.
Figure 3
Figure 3
Alignment of the sequence corresponding to GenBank entry U47924 to the Généthon map. (a) A schematic representation of the 223-kb sequence is shown, with solid boxes showing the extent of the coding sequences for each gene (for clarity, the exon/intron structure is not indicated) and arrows showing the direction of transcription. One partial gene with its 3′ UTR, but no coding sequence, spanning the right boundary of the sequence, was not shown. In addition, a pseudogene and an snRNA gene documented in the region are not shown. STSs identified by e-PCR analysis with microsatellite and gene-based markers are indicated. Eight of the nine gene-based markers were found within the 3′ UTR, which is consistent with the strategy that was used in their development. (b) A portion of the Généthon genetic map of chromosome 12 (from 10 to 26 cM) is reproduced, with lines drawn to show how the sequence (GenBank entry U47924) can be aligned to it based on the presence of markers D12S1623 (at 17.1 cM) and D12S1625 (at 17.9 cM).

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Altschul SF, Boguski MS, Gish W, Wootton JC. Issues in searching molecular sequence databases. Nature Genet. 1994;6:119–129. - PubMed
    1. Ansari-Lari MA, Muzny DM, Lu J, Lu F, Lilley CE, Spanos S, Malley T, Gibbs RA. A gene-rich cluster between the CD4 and triosephosphate isomerase genes at human chromosome 12p13. Genome Res. 1996;6:314–326. - PubMed
    1. Bangham CRM. The polymerase chain reaction: Getting started. In: Mathew CG, editor. Protocols in human molecular genetics. Clifton, NJ: Humana Press; 1991. pp. 1–8.
    1. Benson DA, Boguski M, Lipman DJ, Ostell J. GenBank. Nucleic Acids Res. 1996;24:1–5. - PMC - PubMed