Review

. 2012 Jan 3;729(1-2):1-15.

doi: 10.1016/j.mrfmmm.2011.10.001. Epub 2011 Oct 12.

Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

Michael Gundry¹, Jan Vijg

Affiliations

PMID: 22016070
PMCID: PMC3237897
DOI: 10.1016/j.mrfmmm.2011.10.001

Review

Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

Michael Gundry et al. Mutat Res. 2012.

. 2012 Jan 3;729(1-2):1-15.

doi: 10.1016/j.mrfmmm.2011.10.001. Epub 2011 Oct 12.

Authors

Michael Gundry¹, Jan Vijg

Affiliation

¹ Albert Einstein College of Medicine, Department of Genetics, New York, NY 10461, United States.

PMID: 22016070
PMCID: PMC3237897
DOI: 10.1016/j.mrfmmm.2011.10.001

Abstract

DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief overview of new sequencing platforms that are currently waiting in the wings to advance this exploding field even further.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

**Figure 1**
Somatic mutation frequencies in the aging mouse. Spontaneous *lacZ* mutant frequencies increase at different rates during aging in the brain, testis, spleen, liver, heart and small intestine of *lacZ* transgenic mice. The lines represent the mean mutant frequencies in different age groups. The gray fading area represents the survival curve of the mice, with 50% survival at 26.5 months. (Taken from Jan Vijg and Martijn E. T. Dollé. Large genome rearrangements as a primary cause of aging. Mechanisms of Ageing and Development 123, 907–915, 2002.)

**Figure 2**
Sequencing cost per megabase since 1971. The sequencing cost per megabase has decreased rapidly since the first 12 bp were sequenced in 1971. The cost is displayed using a logarithmic scale, with key events in the history of sequencing plotted on the curve. Of note, the cost of mutation discovery is dependent on the platform error rate and therefore reductions in cost may be less significant then they appear.

**Figure 3**
Massively parallel sequencing technologies. A schematic showing sample preparation and sequencing technologies for the four major commercially available sequencers: the GS FLX Titanium by 454 (Roche), the HiSeq by Illumina, the SOLiD system by Life Technologies and the PacBio RS by Pacific Biosciences.

**Figure 4**
Bioinformatics formats and tools. a. FASTQ format, which uses four lines per read, is the preferred output format for MPS data. The example shown is Illumina FASTQ format, with the read identifier occupying the first and third lines, and the sequence and associated base qualities occupying the second and fourth lines, respectively. b. Spaced seeds are used by modern alignment algorithms in order to improve alignment performance around variants and sequencing errors. c. The Burrows-Wheeler transform, used by the aligners BWA and bowtie, rearranges the order of a sequence in a programmed fashion in order to cluster similar sequence patterns and thereby improve data compression. d. The SAM format is the alignment output for BWA, as well as other programs. It is the preferred format for downstream variant analysis tools. The flag field is a decimal number that has to be interpreted as a 16-bit binary number. It contains information on the read and its alignment that can be used to filter or select for a subset of reads. The CIGAR field gives the location of insertions/deletions as well as the location of clipped bases. e. Reads aligning across a 1-bp deletion are shown before (bottom half) and after (top half) the local realignment step performed by GATK. The realignment helps to reduce false positive SNP calls.

**Figure 5**
Schematic depiction of the Illumina protocol for structural variation detection. DNA is extracted from a tissue or cell population and randomly fragmented and gel size-selected to approximately 500bp. Adapters are ligated to both ends of the fragments and an enrichment PCR is used to select for fragments with adapters on both ends. The completed library is then diluted and applied to the Illumina flow cell for cluster generation and sequencing. Both ends of the fragments are sequenced in succession producing paired sequencing reads. The paired reads are compared to a reference sequence to identify genome loci where clusters of read pairs provide evidence of a deletion or a translocation.

**Figure 6**
Somatic mutation detection using single cell sequencing. a. Somatic mutations in tissues are rare and therefore found only in single sequencing reads from which they are routinely filtered out as sequencing errors during post-alignment processing. Adopting a single cell approach overcomes this limitation by transforming each somatic event into a consensus variant call. b. An example of a somatically acquired G->A point mutation identified using single cell sequencing. The top panel shows sequencing reads obtained from a single cell, a fraction of which contain the mutant allele. The bottom panel shows sequencing reads obtained from the unamplified population, which do not show evidence of the mutant base. A homozygous SNP specific to the cell-line (C->G) is also shown, and as expected, is found in all reads in both the single cell and the cell population samples. c. An example of a somatically acquired deletion identified using single cell sequencing. The top panel shows sequencing reads from the single cell, with shaded box-arrows representing reads that map across the deleted segment. The bottom panel shows sequencing reads obtained from the unamplified population, which do not show evidence of any deletion.

See this image and copyright information in PMC

References

1. Lynch M. Evolution of the mutation rate. Trends Genet. 2010;26:345–352. - PMC - PubMed
1. Lynch M. The cellular, developmental and population-genetic determinants of mutation-rate evolution. Genetics. 2008;180:933–943. - PMC - PubMed
1. Aravind L, Walker DR, Koonin EV. Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic acids research. 1999;27:1223–1242. - PMC - PubMed
1. Castle WE. The Mutation Theory of Organic Evolution, from the Standpoint of Animal Breeding. Science. 1905;21:521–525. - PubMed
1. Muller HJ. Further changes in the white-eye series of Drosophila and their bearing on the manner of occurrence of mutation. Journal of Experimental Zoology. 1920;31:443–474.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

Affiliation

Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous