Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 23:7:468.
doi: 10.1186/1471-2105-7-468.

SNP-PHAGE--High throughput SNP discovery pipeline

Affiliations

SNP-PHAGE--High throughput SNP discovery pipeline

Lakshmi K Matukumalli et al. BMC Bioinformatics. .

Abstract

Background: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable.

Results: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at http://bfgl.anri.barc.usda.gov/ML/snp-phage/.

Conclusion: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow Chart of SNP-PHAGE. Polymorphisms analysis of multiple sequence tag sites using SNP-PHAGE is effectively a three stage process where the first two stages have to be performed from a UNIX/Linux command line interface. The various tasks mentioned in the second stage are executed in sequence by running a single script. The subsequent analysis steps can be performed from a user friendly web interface.
Figure 2
Figure 2
Screenshot of SNP-PHAGE graphical interface. For making SNP validation decisions this interface provides links for viewing global and local sequence alignment, genotypes along with phred quality scores, machine learning (ML) inference and ML feature values and checkbox/pop down menu to mark individual SNP as being a good/poor call.

Similar articles

  • SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation.
    Panitz F, Stengaard H, Hornshøj H, Gorodkin J, Hedegaard J, Cirera S, Thomsen B, Madsen LB, Høj A, Vingborg RK, Zahn B, Wang X, Wang X, Wernersson R, Jørgensen CB, Scheibye-Knudsen K, Arvin T, Lumholdt S, Sawera M, Green T, Nielsen BJ, Havgaard JH, Brunak S, Fredholm M, Bendixen C. Panitz F, et al. Bioinformatics. 2007 Jul 1;23(13):i387-91. doi: 10.1093/bioinformatics/btm192. Bioinformatics. 2007. PMID: 17646321
  • SNP-VISTA: an interactive SNP visualization tool.
    Shah N, Teplitsky MV, Minovitsky S, Pennacchio LA, Hugenholtz P, Hamann B, Dubchak IL. Shah N, et al. BMC Bioinformatics. 2005 Dec 8;6:292. doi: 10.1186/1471-2105-6-292. BMC Bioinformatics. 2005. PMID: 16336665 Free PMC article.
  • MSQT for choosing SNP assays from multiple DNA alignments.
    Warthmann N, Fitz J, Weigel D. Warthmann N, et al. Bioinformatics. 2007 Oct 15;23(20):2784-7. doi: 10.1093/bioinformatics/btm428. Epub 2007 Sep 4. Bioinformatics. 2007. PMID: 17785349
  • UCSC genome browser: deep support for molecular biomedical research.
    Mangan ME, Williams JM, Lathe SM, Karolchik D, Lathe WC 3rd. Mangan ME, et al. Biotechnol Annu Rev. 2008;14:63-108. doi: 10.1016/S1387-2656(08)00003-3. Biotechnol Annu Rev. 2008. PMID: 18606360 Review.
  • Navigating the HapMap.
    Barnes MR. Barnes MR. Brief Bioinform. 2006 Sep;7(3):211-24. doi: 10.1093/bib/bbl021. Epub 2006 Jul 28. Brief Bioinform. 2006. PMID: 16877472 Review.

Cited by

References

    1. Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR. A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999;23:452–456. doi: 10.1038/70570. - DOI - PubMed
    1. http: www.phrap.org Phrap. 2006. http://www.phrap.org
    1. Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res. 2001;11:1725–1729. doi: 10.1101/gr.194201. - DOI - PMC - PubMed
    1. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. doi: 10.1038/35035083. - DOI - PubMed
    1. Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol. 2002;5:94–100. doi: 10.1016/S1369-5266(02)00240-6. - DOI - PubMed

Publication types

MeSH terms