Methods for visual mining of genomic and proteomic data atlases
- PMID: 22524279
- PMCID: PMC3352268
- DOI: 10.1186/1471-2105-13-58
Methods for visual mining of genomic and proteomic data atlases
Abstract
Background: As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.
Results: This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.
Conclusions: The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.
Figures






Similar articles
-
VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.BMC Genomics. 2012 Apr 5;13:131. doi: 10.1186/1471-2164-13-131. BMC Genomics. 2012. PMID: 22480257 Free PMC article.
-
PathRings: a web-based tool for exploration of ortholog and expression data in biological pathways.BMC Bioinformatics. 2015 May 19;16(1):165. doi: 10.1186/s12859-015-0585-1. BMC Bioinformatics. 2015. PMID: 25982732 Free PMC article.
-
Mining PeptideAtlas for biomarkers and therapeutics in human disease.Curr Pharm Des. 2012;18(6):748-54. doi: 10.2174/138161212799277833. Curr Pharm Des. 2012. PMID: 22236120 Free PMC article.
-
Mass Spectrometry-Based Protein Quantification.Adv Exp Med Biol. 2016;919:255-279. doi: 10.1007/978-3-319-41448-5_15. Adv Exp Med Biol. 2016. PMID: 27975224 Review.
-
Making proteomics data accessible and reusable: current state of proteomics databases and repositories.Proteomics. 2015 Mar;15(5-6):930-49. doi: 10.1002/pmic.201400302. Proteomics. 2015. PMID: 25158685 Free PMC article. Review.
Cited by
-
Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications.J Am Med Inform Assoc. 2013 Dec;20(e2):e281-7. doi: 10.1136/amiajnl-2013-001861. Epub 2013 Aug 1. J Am Med Inform Assoc. 2013. PMID: 23907284 Free PMC article.
-
A new pathway for considering trigger factors based on parallel-serial connection models and displaying the relationships of causal factors in low-probability events.BMC Med Res Methodol. 2023 Apr 15;23(1):93. doi: 10.1186/s12874-023-01919-3. BMC Med Res Methodol. 2023. PMID: 37061684 Free PMC article.
-
Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.Microarrays (Basel). 2015 Aug 12;4(3):339-69. doi: 10.3390/microarrays4030339. Microarrays (Basel). 2015. PMID: 27600228 Free PMC article.
-
MS-Helios: a Circos wrapper to visualize multi-omic datasets.BMC Bioinformatics. 2019 Jan 11;20(1):21. doi: 10.1186/s12859-018-2564-9. BMC Bioinformatics. 2019. PMID: 30634897 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials