Should we have blind faith in bioinformatics software? Illustrations from the SNAP web-based tool
- PMID: 25742008
- PMCID: PMC4351168
- DOI: 10.1371/journal.pone.0118925
Should we have blind faith in bioinformatics software? Illustrations from the SNAP web-based tool
Abstract
Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any 'false positive' SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen's Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.
Conflict of interest statement
Similar articles
-
SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.Bioinformatics. 2008 Dec 15;24(24):2938-9. doi: 10.1093/bioinformatics/btn564. Epub 2008 Oct 30. Bioinformatics. 2008. PMID: 18974171 Free PMC article.
-
SNPdetector: a software tool for sensitive and accurate SNP detection.PLoS Comput Biol. 2005 Oct;1(5):e53. doi: 10.1371/journal.pcbi.0010053. Epub 2005 Oct 28. PLoS Comput Biol. 2005. PMID: 16261194 Free PMC article.
-
Effect of genome-wide genotyping and reference panels on rare variants imputation.J Genet Genomics. 2012 Oct 20;39(10):545-50. doi: 10.1016/j.jgg.2012.07.002. Epub 2012 Jul 24. J Genet Genomics. 2012. PMID: 23089364
-
Current bioinformatics tools in genomic biomedical research (Review).Int J Mol Med. 2006 Jun;17(6):967-73. Int J Mol Med. 2006. PMID: 16685403 Review.
-
Navigating the HapMap.Brief Bioinform. 2006 Sep;7(3):211-24. doi: 10.1093/bib/bbl021. Epub 2006 Jul 28. Brief Bioinform. 2006. PMID: 16877472 Review.
Cited by
-
Influence of depression on genetic predisposition to type 2 diabetes in a multiethnic longitudinal study.Sci Rep. 2017 May 9;7(1):1629. doi: 10.1038/s41598-017-01406-y. Sci Rep. 2017. PMID: 28487510 Free PMC article.
-
Genetic contribution to lipid levels in early life based on 158 loci validated in adults: the FAMILY study.Sci Rep. 2017 Mar 6;7(1):68. doi: 10.1038/s41598-017-00102-1. Sci Rep. 2017. PMID: 28250428 Free PMC article.
-
From big data analysis to personalized medicine for all: challenges and opportunities.BMC Med Genomics. 2015 Jun 27;8:33. doi: 10.1186/s12920-015-0108-y. BMC Med Genomics. 2015. PMID: 26112054 Free PMC article. Review.
References
-
- Ouzounis CA, Valencia A. Early bioinformatics: the birth of a discipline—a personal view. Bioinformatics. 2003;19(17):2176–90. - PubMed
-
- Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445(7130):881–5. - PubMed
-
- Morrison KM, Atkinson SA, Yusuf S, Bourgeois J, McDonald S, McQueen MJ, et al. The Family Atherosclerosis Monitoring In earLY life (FAMILY) study: rationale, design, and baseline data of a study examining the early determinants of atherosclerosis. American heart journal. 2009;158(4):533–9. 10.1016/j.ahj.2009.07.005 - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources