GenoSurf: metadata driven semantic search system for integrated genomic datasets
- PMID: 31820804
- PMCID: PMC6902006
- DOI: 10.1093/database/baz132
GenoSurf: metadata driven semantic search system for integrated genomic datasets
Abstract
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
© The Author(s) 2019. Published by Oxford University Press.
Figures










Similar articles
-
Ontology-Based Search of Genomic Metadata.IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):233-47. doi: 10.1109/TCBB.2015.2495179. Epub 2015 Oct 26. IEEE/ACM Trans Comput Biol Bioinform. 2016. PMID: 26529777
-
linkedISA: semantic representation of ISA-Tab experimental metadata.BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-15-S14-S4. Epub 2014 Nov 27. BMC Bioinformatics. 2014. PMID: 25472428 Free PMC article.
-
Scaling the walls of discovery: using semantic metadata for integrative problem solving.Brief Bioinform. 2009 Mar;10(2):164-76. doi: 10.1093/bib/bbp007. Brief Bioinform. 2009. PMID: 19304872
-
Ontology application and use at the ENCODE DCC.Database (Oxford). 2015 Mar 16;2015:bav010. doi: 10.1093/database/bav010. Print 2015. Database (Oxford). 2015. PMID: 25776021 Free PMC article. Review.
-
Techniques for optimization of queries on integrated biological resources.J Bioinform Comput Biol. 2004 Jun;2(2):375-411. doi: 10.1142/s0219720004000648. J Bioinform Comput Biol. 2004. PMID: 15297988 Review.
Cited by
-
ViruSurf: an integrated database to investigate viral sequences.Nucleic Acids Res. 2021 Jan 8;49(D1):D817-D824. doi: 10.1093/nar/gkaa846. Nucleic Acids Res. 2021. PMID: 33045721 Free PMC article.
-
PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.bioRxiv [Preprint]. 2024 May 11:2023.08.15.551388. doi: 10.1101/2023.08.15.551388. bioRxiv. 2024. Update in: Gigascience. 2024 Jan 2;13:giae033. doi: 10.1093/gigascience/giae033. PMID: 37645717 Free PMC article. Updated. Preprint.
-
A review on viral data sources and search systems for perspective mitigation of COVID-19.Brief Bioinform. 2021 Mar 22;22(2):664-675. doi: 10.1093/bib/bbaa359. Brief Bioinform. 2021. PMID: 33348368 Free PMC article. Review.
-
Dug: a semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories.Bioinformatics. 2022 Jun 13;38(12):3252-3258. doi: 10.1093/bioinformatics/btac284. Bioinformatics. 2022. PMID: 35441678 Free PMC article.
-
EpiSurf: metadata-driven search server for analyzing amino acid changes within epitopes of SARS-CoV-2 and other viral species.Database (Oxford). 2021 Sep 29;2021:baab059. doi: 10.1093/database/baab059. Database (Oxford). 2021. PMID: 34585726 Free PMC article.
References
-
- Bernasconi A., Ceri S., Campi A. et al. (2017) Conceptual modeling for genomics: building an integrated repository of open data In: Proceedings of Conceptual Modeling - 36th International Conference (ER 2017). Valencia, Spain, pp. 325–339.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases