Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 6;5(9):1797-803.
doi: 10.1534/g3.115.019703.

SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops

Affiliations

SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops

Josh P Clevenger et al. G3 (Bethesda). .

Abstract

High-throughput next-generation sequence-based genotyping and single nucleotide polymorphism (SNP) detection opens the door for emerging genomics-based breeding strategies such as genome-wide association analysis and genomic selection. In polyploids, SNP detection is confounded by a highly similar homeologous sequence where a polymorphism between subgenomes must be differentiated from a SNP. We have developed and implemented a novel tool called SWEEP: Sliding Window Extraction of Explicit Polymorphisms. SWEEP uses subgenome polymorphism haplotypes as contrast to identify true SNPs between genotypes. The tool is a single command script that calls a series of modules based on user-defined options and takes sorted/indexed bam files or vcf files as input. Filtering options are highly flexible and include filtering based on sequence depth, alternate allele ratio, and SNP quality on top of the SWEEP filtering procedure. Using real and simulated data we show that SWEEP outperforms current SNP filtering methods for polyploids. SWEEP can be used for high-quality SNP discovery in polyploid crops.

Keywords: SNP; peanut; polyploidy.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Logic for SWEEP pipeline. (B) Example of detection of a SNP between genotypes. Blue bar represents the reference consensus sequence. Green bars represent one subgenome-derived sequence. Orange bars represent the alternative subgenome-derived sequence. Bases in red are within genome polymorphisms and in this instance are the anchor SNPs. Bases in yellow are the true between-genotype SNPs.
Figure 2
Figure 2
SWEEP filtering vs. traditional filtering methods. (A) Samtools-called SNPs were filtered using vcftools and SWEEP and evaluated for computational time using combinations of all five, four, three, and two genotypes. (B) SWEEP filtering and traditional filtering were evaluated for false-positive rate using combinations of all five, four, three, and two genotypes.
Figure 3
Figure 3
Pairwise polymorphism between genotypes. The upper diagonal above shows pairwise fraction of polymorphic SNPs relative to total SNPs called. The heatmap in the lower diagonal reflects the range of pairwise polymorphic SNPs. Genotypes are Tifrunner (TR), NC3033 (NC), C76-16 (C76), Florida-07 (F07), and SPT06-06 (SPT).
Figure 4
Figure 4
Percentage of true SNPs compared to false-positive, homeologous SNPs called by filtering method. Samtools and GATK with traditional filtering methods were compared to SWEEP filtering in a simulation with 5×, 10×, 15×, and 20× coverage. (A) Recovery of true SNPs as percentage of total SNPs retained after SWEEP filtering and traditional filtering using GATK and Samtools. (B) Recovery of true SNPs as percentage of total simulated true SNPs.

References

    1. Bassil N., Davis T. M., Zhang H., Ficklin S., Mittmann M., et al. , 2015. Development and preliminary evaluation of a 90 K Axiom SNP array for the allo-octoploid cultivated strawberry Fragaria × ananassa. BMC Genomics 16: 10.1186/s12864-12015-11310-12861 - DOI - PMC - PubMed
    1. Bertioli D. J., Vidigal B., Nielen S., Ratnaparkhe M. B., Lee T. H., et al. , 2013. The repetitive component of the A genome of peanut (Arachis hypogaea) and its role in remodelling intergenic sequence space since its evolutionary divergence from the B genome. Ann. Bot. (Lond.) 112: 545–559. - PMC - PubMed
    1. Bertioli D. J., Ozias-Akins P., Chu Y., Dantas K. M., Santos S. P., 2014. The use of SNP markers for linkage mapping in diploid and tetraploid peanuts. G3 (Bethesda) 4: 89–96. - PMC - PubMed
    1. Byers R. L., Harker D. B., Yourstone S. M., Maughan P. J., Udall J. A., 2012. Development and mapping of SNP assays in allotetraploid cotton. Theor. Appl. Genet. 124: 1201–1214. - PMC - PubMed
    1. Cavanagh C. R., Chao S., Wang S., Huang B. E., Stephen S., et al. , 2013. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl. Acad. Sci. USA 110: 8057–8062. - PMC - PubMed

Publication types

LinkOut - more resources