Discovering and linking public omics data sets using the Omics Discovery Index

Yasset Perez-Riverol¹, Mingze Bai^{1

2

3}, Felipe da Veiga Leprevost⁴, Silvano Squizzato¹, Young Mi Park¹, Kenneth Haug¹, Adam J Carroll⁵, Dylan Spalding¹, Justin Paschall¹, Mingxun Wang⁶, Noemi Del-Toro¹, Tobias Ternent¹, Peng Zhang^{4

7}, Nicola Buso¹, Nuno Bandeira⁶, Eric W Deutsch⁸, David S Campbell⁸, Ronald C Beavis⁹, Reza M Salek¹, Ugis Sarkans¹, Robert Petryszak¹, Maria Keays¹, Eoin Fahy¹⁰, Manish Sud¹⁰, Shankar Subramaniam¹⁰, Ariana Barbera¹¹, Rafael C Jiménez¹², Alexey I Nesvizhskii⁴, Susanna-Assunta Sansone¹³, Christoph Steinbeck¹, Rodrigo Lopez¹, Juan A Vizcaíno¹, Peipei Ping^{14

15}, Henning Hermjakob^{1

3}

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom.
² Institute of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China.
³ Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing, China.
⁴ Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
⁵ Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia.
⁶ Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA.
⁷ Commonwealth Scientific and Industrial Research Organization, Canberra, Australian Capital Territory, Australia.
⁸ Institute for Systems Biology, Seattle, Washington, USA.
⁹ Biochemistry &Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada.
¹⁰ Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.
¹¹ Department of Medicine, University of Cambridge, Cambridge, United Kingdom.
¹² ELIXIR Hub, Wellcome Genome Campus, Hinxton, United Kingdom.
¹³ Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom.
¹⁴ Department of Physiology and Department of Medicine, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, California, USA.
¹⁵ Department of Medicine, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, California, USA.

PMID: 28486464
PMCID: PMC5831141
DOI: 10.1038/nbt.3790

Discovering and linking public omics data sets using the Omics Discovery Index

Yasset Perez-Riverol et al. Nat Biotechnol. 2017.

. 2017 May 9;35(5):406-409.

doi: 10.1038/nbt.3790.

Authors

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom.
² Institute of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China.
³ Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing, China.
⁴ Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
⁵ Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia.
⁶ Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA.
⁷ Commonwealth Scientific and Industrial Research Organization, Canberra, Australian Capital Territory, Australia.
⁸ Institute for Systems Biology, Seattle, Washington, USA.
⁹ Biochemistry &Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada.
¹⁰ Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.
¹¹ Department of Medicine, University of Cambridge, Cambridge, United Kingdom.
¹² ELIXIR Hub, Wellcome Genome Campus, Hinxton, United Kingdom.
¹³ Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom.
¹⁴ Department of Physiology and Department of Medicine, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, California, USA.
¹⁵ Department of Medicine, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, California, USA.

PMID: 28486464
PMCID: PMC5831141
DOI: 10.1038/nbt.3790

No abstract available

PubMed Disclaimer

Figures

**Figure 1**
Omics Discovery Index: data standardization, annotation, index and presentation. (a) The datasets stored in public repositories are converted to a common data representation including all metadata and biological entities. The OmicsDI XML files are validated using the OmicsDI XML validator. (b) The OmicsDI XML files are then annotated using public services and databases like UniProt, ChEBI, and PubMed, and the metadata is enriched using the Annotator service. The EBI search engine generates the indexes including other related resources such as PubMed, UniProt, Ensembl and ChEBI. **(c)** Different clients can use the OmicsDI API to retrieve data from the resource including the web interface and the ddiR package.

**Figure 2**
Distributions of OmicsDI datasets. **(a)** Distribution of datasets per omics type and organism category including model organisms, non-model organisms (excluding human) and human. **(b)** The dataset view showing the *other related omics datasets*, including the ontology highlighting option to extract the most relevant terms in the metadata. **(c)** Pearson-correlation plot between the metadata similarity score and the biological similarity score, across transcriptomics (T), proteomics (P) and metabolomics (M) datasets. **(d)** The shared molecules box shows all datasets with a biological similarity score of more than 0.5, with a slider allowing a user to increase the cutoff value (here set to 0.81).

See this image and copyright information in PMC

References

1. Bourne PE, Lorsch JR, Green ED. Perspective: Sustaining the big-data ecosystem. Nature. 2015;527:S16–17. - PubMed
1. Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaino JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics. 2015;15:930–949. - PMC - PubMed
1. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. - PMC - PubMed
1. Prins P, et al. Toward effective software solutions for big biology. Nature biotechnology. 2015;33:686–687. - PubMed
1. Bourne PE, et al. The NIH Big Data to Knowledge (BD2K) initiative. J Am Med Inform Assoc. 2015;22:1114. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovering and linking public omics data sets using the Omics Discovery Index

Affiliations

Discovering and linking public omics data sets using the Omics Discovery Index

Authors

Affiliations

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources