SRAdb: query and use public next-generation sequencing data from within R
- PMID: 23323543
- PMCID: PMC3560148
- DOI: 10.1186/1471-2105-14-19
SRAdb: query and use public next-generation sequencing data from within R
Abstract
Background: The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.
Results: SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.
Conclusions: SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.
Similar articles
-
pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.F1000Res. 2019 Apr 23;8:532. doi: 10.12688/f1000research.18676.1. eCollection 2019. F1000Res. 2019. PMID: 31114675 Free PMC article.
-
GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus.Bioinformatics. 2008 Dec 1;24(23):2798-800. doi: 10.1093/bioinformatics/btn520. Epub 2008 Oct 7. Bioinformatics. 2008. PMID: 18842599 Free PMC article.
-
Investigation into the annotation of protocol sequencing steps in the sequence read archive.Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015. Gigascience. 2015. PMID: 25960871 Free PMC article.
-
Massively parallel sequencing approaches for characterization of structural variation.Methods Mol Biol. 2012;838:369-84. doi: 10.1007/978-1-61779-507-7_18. Methods Mol Biol. 2012. PMID: 22228022 Free PMC article. Review.
-
MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.Methods Mol Biol. 2018;1706:223-232. doi: 10.1007/978-1-4939-7471-9_12. Methods Mol Biol. 2018. PMID: 29423801 Review.
Cited by
-
Insights into the global freshwater virome.Front Microbiol. 2022 Sep 28;13:953500. doi: 10.3389/fmicb.2022.953500. eCollection 2022. Front Microbiol. 2022. PMID: 36246212 Free PMC article.
-
Eye in a Disk: eyeIntegration Human Pan-Eye and Body Transcriptome Database Version 1.0.Invest Ophthalmol Vis Sci. 2019 Jul 1;60(8):3236-3246. doi: 10.1167/iovs.19-27106. Invest Ophthalmol Vis Sci. 2019. PMID: 31343654 Free PMC article.
-
HFIP: an integrated multi-omics data and knowledge platform for the precision medicine of heart failure.Database (Oxford). 2021 Nov 13;2021(2021):baab076. doi: 10.1093/database/baab076. Database (Oxford). 2021. PMID: 34791105 Free PMC article.
-
Ontology-driven integrative analysis of omics data through Onassis.Sci Rep. 2020 Jan 20;10(1):703. doi: 10.1038/s41598-020-57716-1. Sci Rep. 2020. PMID: 31959844 Free PMC article.
-
PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive.Bioinformatics. 2017 Aug 1;33(15):2389-2391. doi: 10.1093/bioinformatics/btx184. Bioinformatics. 2017. PMID: 28369246 Free PMC article.
References
-
- The NCBI Sequence Read Archive. http://www.ncbi.nlm.nih.gov/sra
-
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2012. [ http://www.R-project.org/]. [ISBN 3-900051-07-0]
-
- James DA, Falcon S. RSQLite: SQLite interface for R. 2012. http://CRAN.R-project.org/package=RSQLite. [R package version 0.11.2]
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases