Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Aug 22;11(8):e0161333.
doi: 10.1371/journal.pone.0161333. eCollection 2016.

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies

Affiliations
Comparative Study

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies

Davoud Torkamaneh et al. PLoS One. .

Abstract

Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1& v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79-92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (~50-70%).

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Venn diagram representing the degree of overlap among SNP loci called using seven bioinformatics pipelines.
The percentages indicate the estimated accuracy for all groups of SNPs (unique or shared).
Fig 2
Fig 2. Systematic approach used to investigate the possible causes of unique inaccurate SNP calls.
Fig 3
Fig 3
Venn diagram for overlap of the SNPs called using two different bioinformatics pipelines (a) Overlap of SNPs called with Fast-GBS using Illumina and Ion Torrent reads. (b) Overlap of SNPs called with TASSEL-GBS v2 using Illumina and Ion Torrent reads. The percentages indicate the estimated accuracy for all groups of SNPs (unique or shared).

References

    1. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM and Blaxter ML. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature. 2011; 10.1038/nrg3012 - DOI - PubMed
    1. Miller M. R., Dunham J. P., Amores A., Cresko W. A. & Johnson E. A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 17, 240–248 (2007). - PMC - PubMed
    1. Baird NA, Etter PD, Atwood TS, et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE, 3, e3376 10.1371/journal.pone.0003376 - DOI - PMC - PubMed
    1. Van Orsouw NJ, Hogers RCJ, Janssen A et al. (2007) Complexity reduction of polymorphic sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS ONE, 2, e1172 - PMC - PubMed
    1. Andolfatto P. et al. Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 21, 610–617 (2011). 10.1101/gr.115402.110 - DOI - PMC - PubMed

Publication types

LinkOut - more resources