Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Feb 18;32(3):e35.
doi: 10.1093/nar/gnh022.

Unlocking hidden genomic sequence

Affiliations

Unlocking hidden genomic sequence

Jonathan M Keith et al. Nucleic Acids Res. .

Abstract

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The error probability versus the number of mutants for three different intensities of mutation: low (3%), medium (7%) and high (18%). These graphs are used prior to sequencing to estimate the number of mutants required. The assumed substitution probabilities are shown in Supplementary tables 1–3, respectively.
Figure 2
Figure 2
Sequence chromatograms of a D.discoideum shotgun clone (JC1a86h11) containing a homopolymer tract sequenced with BigDye v2.0 and M13-21 universal primer. (A) The sequence of the wild-type plasmid DNA showing the consequences of polymerase slippage within the homopolymer and resulting harmonic stutter peaks in the trace. (B) Introducing 12% random substitutions using dPTP reduced the uniformity of the problem motifs. The mutated variant of JC1a86h11 can then be readily sequenced.
Figure 3
Figure 3
Sequence chromatograms of a human clone (A9A05J2) containing a GC-rich tract sequenced with BigDye v3.1 and M13-21 universal primer on a ABI 3730xl capillary sequencer. (A) The sequence of the wild-type plasmid DNA showing the consequences of polymerase slippage within the homopolymer and resulting harmonic stutter peaks in the trace. (B) Introducing 11% random substitutions using 5-Br-dUTP reduced the uniformity of the problem motifs. The mutated variant of A9A05J2 can then be readily sequenced.
Figure 4
Figure 4
Inferred original sequence of the Dictyostelium fragment JC1a86h11 obtained using simulated annealing consensus (SAC) and probabilistic Bayesian (Bay) approaches. Sequences were inferred using four mutants with 3% mutation intensity (4×Low), six mutant sequences with 7% mutation intensity (6×Med), and the collection of all 10 mutant sequences (10×All). Bases marked with a period are identical to the base at the bottom of that column. Mutations were induced using dPTP at low (4×Low) or medium (6×Med) concentration.
Figure 5
Figure 5
SAM reconstruction of a known sequence (pTEST), using 14 mutant copies of the sequence. The dPTP-induced mutants were found to differ from the original sequence on average in ∼18% of bases. (A) Inferred original sequence based on alignment using ClustalW (ClustalW) and the Bayesian approach (Bayesian). The known original sequence (Original) is also shown. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values (vertical axis) for the Bayesian reconstruction. Quality values are assigned to each base and to the hypotheses that there are no additional bases between each pair of adjacent characters, or at the ends of the inferred sequence. On the horizontal axis, odd numbers represent positions between characters and at the ends of the sequence, whereas even numbers represent base positions. The same convention is used in Figure 6B. Quality values for odd-numbered positions are not shown; all are 99.
Figure 5
Figure 5
SAM reconstruction of a known sequence (pTEST), using 14 mutant copies of the sequence. The dPTP-induced mutants were found to differ from the original sequence on average in ∼18% of bases. (A) Inferred original sequence based on alignment using ClustalW (ClustalW) and the Bayesian approach (Bayesian). The known original sequence (Original) is also shown. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values (vertical axis) for the Bayesian reconstruction. Quality values are assigned to each base and to the hypotheses that there are no additional bases between each pair of adjacent characters, or at the ends of the inferred sequence. On the horizontal axis, odd numbers represent positions between characters and at the ends of the sequence, whereas even numbers represent base positions. The same convention is used in Figure 6B. Quality values for odd-numbered positions are not shown; all are 99.
Figure 6
Figure 6
Alignment of DNA sequences of 16 individual clones of the ‘unclonable’ human mitochondrial tRNAThr gene (42) to the inferred original sequence (Bayesian). (A) The putative ‘mutation hotspots’ necessary for clone stability in E.coli are outlined (large boxes). Thirteen mutants (1-1–1-13) were generated using 8-oxo-dGTP (24) and three mutants (2-1–2-3) were generated using dPTP at a high concentration. The inferred sequence agreed with known mitochondrial gene sequence (accession no. HUMMTCG) across both the bulk (0.7% mutated) and hotspot (12% mutated) regions except in one base. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values for the Bayesian reconstruction using the first six mutants only from (A). The inferred sequence is correct.
Figure 6
Figure 6
Alignment of DNA sequences of 16 individual clones of the ‘unclonable’ human mitochondrial tRNAThr gene (42) to the inferred original sequence (Bayesian). (A) The putative ‘mutation hotspots’ necessary for clone stability in E.coli are outlined (large boxes). Thirteen mutants (1-1–1-13) were generated using 8-oxo-dGTP (24) and three mutants (2-1–2-3) were generated using dPTP at a high concentration. The inferred sequence agreed with known mitochondrial gene sequence (accession no. HUMMTCG) across both the bulk (0.7% mutated) and hotspot (12% mutated) regions except in one base. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values for the Bayesian reconstruction using the first six mutants only from (A). The inferred sequence is correct.

Similar articles

Cited by

References

    1. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. The International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Glöckner G., Eichinger,L., Szafranski,K., Pachebat,J.A., Bankier,A.T., Dear,P.H., Lehmann,R., Baumgart,C., Parra,G., Abril,J.F. et al. (2002) Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature, 418, 79–85. - PubMed
    1. Ji J., Clegg,N.J., Peterson,K.R., Jackson,A.L., Laird,C.D. and Loeb,L.A. (1996) In vitro expansion of GGC:GCC repeats: identification of the preferred strand of expansion. Nucleic Acids Res., 24, 2835–2840. - PMC - PubMed
    1. Tabor S. and Richardson,C.C. (1987) DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl Acad. Sci. USA, 84, 4767–4771. - PMC - PubMed
    1. Donlin M.J. and Johnson,K.A. (1994) Mutants affecting nucleotide recognition by T7 DNA polymerase. Biochemistry, 33, 14908–14917. - PubMed

Publication types