. 2004 Feb 18;32(3):e35.

doi: 10.1093/nar/gnh022.

Unlocking hidden genomic sequence

Jonathan M Keith¹, Duncan A E Cochran, Gita H Lala, Peter Adams, Darryn Bryant, Keith R Mitchelson

Affiliations

PMID: 14973330
PMCID: PMC373418
DOI: 10.1093/nar/gnh022

Unlocking hidden genomic sequence

Jonathan M Keith et al. Nucleic Acids Res. 2004.

. 2004 Feb 18;32(3):e35.

doi: 10.1093/nar/gnh022.

Authors

Jonathan M Keith¹, Duncan A E Cochran, Gita H Lala, Peter Adams, Darryn Bryant, Keith R Mitchelson

Affiliation

¹ Department of Mathematics, University of Queensland, St Lucia, Queensland 4072, Australia. j.keith1@mailbox.uq.edu.au

PMID: 14973330
PMCID: PMC373418
DOI: 10.1093/nar/gnh022

Abstract

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs.

PubMed Disclaimer

Figures

**Figure 1**
The error probability versus the number of mutants for three different intensities of mutation: low (3%), medium (7%) and high (18%). These graphs are used prior to sequencing to estimate the number of mutants required. The assumed substitution probabilities are shown in Supplementary tables 1–3, respectively.

**Figure 2**
Sequence chromatograms of a *D.discoideum* shotgun clone (JC1a86h11) containing a homopolymer tract sequenced with BigDye v2.0 and M13-21 universal primer. (A) The sequence of the wild-type plasmid DNA showing the consequences of polymerase slippage within the homopolymer and resulting harmonic stutter peaks in the trace. (B) Introducing 12% random substitutions using dPTP reduced the uniformity of the problem motifs. The mutated variant of JC1a86h11 can then be readily sequenced.

**Figure 3**
Sequence chromatograms of a human clone (A9A05J2) containing a GC-rich tract sequenced with BigDye v3.1 and M13-21 universal primer on a ABI 3730xl capillary sequencer. (A) The sequence of the wild-type plasmid DNA showing the consequences of polymerase slippage within the homopolymer and resulting harmonic stutter peaks in the trace. (B) Introducing 11% random substitutions using 5-Br-dUTP reduced the uniformity of the problem motifs. The mutated variant of A9A05J2 can then be readily sequenced.

**Figure 4**
Inferred original sequence of the *Dictyostelium* fragment JC1a86h11 obtained using simulated annealing consensus (SAC) and probabilistic Bayesian (Bay) approaches. Sequences were inferred using four mutants with 3% mutation intensity (4×Low), six mutant sequences with 7% mutation intensity (6×Med), and the collection of all 10 mutant sequences (10×All). Bases marked with a period are identical to the base at the bottom of that column. Mutations were induced using dPTP at low (4×Low) or medium (6×Med) concentration.

**Figure 5**
SAM reconstruction of a known sequence (pTEST), using 14 mutant copies of the sequence. The dPTP-induced mutants were found to differ from the original sequence on average in ∼18% of bases. (A) Inferred original sequence based on alignment using ClustalW (ClustalW) and the Bayesian approach (Bayesian). The known original sequence (Original) is also shown. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values (vertical axis) for the Bayesian reconstruction. Quality values are assigned to each base and to the hypotheses that there are no additional bases between each pair of adjacent characters, or at the ends of the inferred sequence. On the horizontal axis, odd numbers represent positions between characters and at the ends of the sequence, whereas even numbers represent base positions. The same convention is used in Figure 6B. Quality values for odd-numbered positions are not shown; all are 99.

**Figure 6**
Alignment of DNA sequences of 16 individual clones of the ‘unclonable’ human mitochondrial tRNA^Thr gene (42) to the inferred original sequence (Bayesian). (A) The putative ‘mutation hotspots’ necessary for clone stability in *E.coli* are outlined (large boxes). Thirteen mutants (1-1–1-13) were generated using 8-oxo-dGTP (24) and three mutants (2-1–2-3) were generated using dPTP at a high concentration. The inferred sequence agreed with known mitochondrial gene sequence (accession no. HUMMTCG) across both the bulk (0.7% mutated) and hotspot (12% mutated) regions except in one base. Bases marked with a period are identical to the base at the bottom of that column. (B) Quality values for the Bayesian reconstruction using the first six mutants only from (A). The inferred sequence is correct.

See this image and copyright information in PMC

Cited by

Trial and error: how the unclonable human mitochondrial genome was cloned in yeast.
Bigger BW, Liao AY, Sergijenko A, Coutelle C. Bigger BW, et al. Pharm Res. 2011 Nov;28(11):2863-70. doi: 10.1007/s11095-011-0527-1. Epub 2011 Jul 9. Pharm Res. 2011. PMID: 21739320
Mastering DNA chromatogram analysis in Sanger sequencing for reliable clinical analysis.
Al-Shuhaib MBS, Hashim HO. Al-Shuhaib MBS, et al. J Genet Eng Biotechnol. 2023 Nov 13;21(1):115. doi: 10.1186/s43141-023-00587-6. J Genet Eng Biotechnol. 2023. PMID: 37955813 Free PMC article. Review.
An intermediate grade of finished genomic sequence suitable for comparative analyses.
Blakesley RW, Hansen NF, Mullikin JC, Thomas PJ, McDowell JC, Maskeri B, Young AC, Benjamin B, Brooks SY, Coleman BI, Gupta J, Ho SL, Karlins EM, Maduro QL, Stantripop S, Tsurgeon C, Vogt JL, Walker MA, Masiello CA, Guan X; NISC Comparative Sequencing Program; Bouffard GG, Green ED. Blakesley RW, et al. Genome Res. 2004 Nov;14(11):2235-44. doi: 10.1101/gr.2648404. Epub 2004 Oct 12. Genome Res. 2004. PMID: 15479945 Free PMC article.
An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and next generation sequencing.
Sipos B, Massingham T, Stütz AM, Goldman N. Sipos B, et al. PLoS One. 2012;7(8):e43359. doi: 10.1371/journal.pone.0043359. Epub 2012 Aug 17. PLoS One. 2012. PMID: 22912860 Free PMC article.
Characterizing and measuring bias in sequence data.
Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB. Ross MG, et al. Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51. Genome Biol. 2013. PMID: 23718773 Free PMC article.

See all "Cited by" articles

References

1. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. The International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
1. Glöckner G., Eichinger,L., Szafranski,K., Pachebat,J.A., Bankier,A.T., Dear,P.H., Lehmann,R., Baumgart,C., Parra,G., Abril,J.F. et al. (2002) Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature, 418, 79–85. - PubMed
1. Ji J., Clegg,N.J., Peterson,K.R., Jackson,A.L., Laird,C.D. and Loeb,L.A. (1996) In vitro expansion of GGC:GCC repeats: identification of the preferred strand of expansion. Nucleic Acids Res., 24, 2835–2840. - PMC - PubMed
1. Tabor S. and Richardson,C.C. (1987) DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl Acad. Sci. USA, 84, 4767–4771. - PMC - PubMed
1. Donlin M.J. and Johnson,K.A. (1994) Mutants affecting nucleotide recognition by T7 DNA polymerase. Biochemistry, 33, 14908–14917. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unlocking hidden genomic sequence

Affiliation

Unlocking hidden genomic sequence

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous