Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 19:17:41.
doi: 10.1186/s12859-016-0892-1.

4Pipe4--A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

Affiliations

4Pipe4--A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

Francisco Pina-Martins et al. BMC Bioinformatics. .

Abstract

Background: Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task.

Results: Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets.

Conclusions: This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4's source code is available at https://github.com/StuntsPT/4Pipe4.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
4Pipe4 flowchart. The rectangular shapes represent processes, the rhomboid shapes represent input/output files. The dashed arrows represent optional steps. The names inside square brackets are the names of the used external programs. The digits on the top right corner of each rectangle represent the step number of each process

References

    1. Schuster SC. Next-generation sequencing transforms today’s biology. Nat Methods. 2008;5:16–8. doi: 10.1038/nmeth1156. - DOI - PubMed
    1. Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG. Next generation transcriptomes for next generation genomes using est2assembly. BMC Bioinformatics. 2009;10:447. doi: 10.1186/1471-2105-10-447. - DOI - PMC - PubMed
    1. Peterlongo P, Schnel N, Pisanti N, Sagot MF, Lacroix V. Identifying SNPs without a Reference Genome by Comparing Raw Reads. In: Chaves E, Lonardi S, editors. String Processing and Information Retrieval. Springer Berlin Heidelberg; 2010. p. 147-58.
    1. Modesto IS, Miguel C, Pina-Martins F, Glushkova M, Veloso M, Paulo OS, et al. Identifying signatures of natural selection in cork oak (Quercus suber L.) genes through SNP analysis. Tree Genet Genomes. 2014;10:1645–60. doi: 10.1007/s11295-014-0786-1. - DOI
    1. Savage AE, Kiemnec-Tyburczy KM, Ellison AR, Fleischer RC, Zamudio KR. Conservation and divergence in the frog immunome: pyrosequencing and de novo assembly of immune tissue transcriptomes. Gene. 2014;542:98–108. doi: 10.1016/j.gene.2014.03.051. - DOI - PubMed

Publication types