Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov 14;2(11):e1172.
doi: 10.1371/journal.pone.0001172.

Complexity reduction of polymorphic sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes

Affiliations

Complexity reduction of polymorphic sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes

Nathalie J van Orsouw et al. PLoS One. .

Abstract

Application of single nucleotide polymorphisms (SNPs) is revolutionizing human bio-medical research. However, discovery of polymorphisms in low polymorphic species is still a challenging and costly endeavor, despite widespread availability of Sanger sequencing technology. We present CRoPS as a novel approach for polymorphism discovery by combining the power of reproducible genome complexity reduction of AFLP with Genome Sequencer (GS) 20/GS FLX next-generation sequencing technology. With CRoPS, hundreds-of-thousands of sequence reads derived from complexity-reduced genome sequences of two or more samples are processed and mined for SNPs using a fully-automated bioinformatics pipeline. We show that over 75% of putative maize SNPs discovered using CRoPS are successfully converted to SNPWave assays, confirming them to be true SNPs derived from unique (single-copy) genome sequences. By using CRoPS, polymorphism discovery will become affordable in organisms with high levels of repetitive DNA in the genome and/or low levels of polymorphism in the (breeding) germplasm without the need for prior sequence information.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Paid employment: All authors and co-authors with the exception of S. Snoeijers are employees of Keygene N.V. S. Snoeijers was a Keygene N.V. employee during the execution of this project but he has left Keygene N.V. on December 31, 2006. Patent application: The CRoPS technology is subject to patent applications owned by Keygene N.V. See also acknowledgements.

Figures

Figure 1
Figure 1. Bioinformatics pipeline for high-throughput analysis of CRoPS sequence runs.
Figure 2
Figure 2. Example of a multiple sequence alignment (MSA) with SNP and sample related properties.
SNP properties include sequence depth (sd), the count on the number of reads at the polymorphic position, the relative position of the SNP on the consensus sequence, the distance to the neighboring SNP, flanking sequence size and homopolymeric region information. Sample related properties were derived from the Oracle database. The ratio sample sequence depth to MSA sequence depth is calculated.
Figure 3
Figure 3. Number of putative SNPs and indels as a function of the minimal length of flanking sequences surrounding the SNP and the minimal interval devoid of additional SNPs/indels.
Figure 4
Figure 4. Pseudo-gel image visualizations of two SNPWave assays in maize detected by capillary electrophoresis.
Left panel: 13-plex SNPWave assay; right panel: 10-plex SNPWave assay. Number 1-9 represent different recombinant inbred line offspring of B73 and Mo17.
Figure 5
Figure 5. Composition and hypothesized cause of “mixed fragments”.
“Mixed fragments” are characterized by the occurrence of the sample identification tag of sample 1 on one side and the sample identification tag of sample 2 on the other side. (A) Schematic representation of observed homoduplex and heteroduplex fragment types containing expected tags and “mixed fragments”. (B) “Mixed fragments” are formed when (1) a heteroduplex is formed between complementary strands of samples 1 and 2, (2) 3′-5′ exonuclease activity of T4 DNA polymerase removes the sequence tags at the 3′ ends, (3) polymerase activity of T4 DNA polymerase extends the 3′ ends using the opposite strand as template, resulting in incorporation of the “wrong” sequence tag, i.e. the observation of “mixed fragments”.
Figure 6
Figure 6. Protocol modification to avoid “mixed fragments”.
(A) Blunt-end adapter ligation as per the original GS 20 library preparation protocol. (B) T/A ligation as applied in the CRoPS protocol. Amplification using a polymerase lacking 3′-5′ exonuclease (proofreading) activity is performed resulting in A-addition to the AFLP fragments, after which the T-adapters can be ligated. (C) Flowcharts of the original GS 20 library preparation protocol and the CRoPS library preparation protocol.

References

    1. The Arabidopsis Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. - PubMed
    1. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature. 2005;436:793–800. - PubMed
    1. SanMiguel P, Tikhonov A, Jin Y-K, Motchoulskaia N, Zakharov D, et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. - PubMed
    1. Li W, Zhang P, Fellers JP, Friebe B, Gill BS. Sequence composition, organization, and evolution of the core Triticeae genome. Plant J. 2004;40:500–511. - PubMed
    1. Swaminathan K, Varala K, Hudson ME. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey. BMC Genomics. 2007;8:1471–2164. - PMC - PubMed

Publication types