Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov;22(6):1148-52.
doi: 10.1093/jamia/ocv048. Epub 2015 Jun 25.

The center for expanded data annotation and retrieval

Affiliations

The center for expanded data annotation and retrieval

Mark A Musen et al. J Am Med Inform Assoc. 2015 Nov.

Abstract

The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

Keywords: biological ontologies; data collection; data curation; datasets as topic; standards.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The CEDAR ecosystem for metadata management. Communities of biomedical scientists author metadata templates (and template components), which are stored in an online template repository (left panel). Investigators annotate their experimental data by assembling composite templates and by filling in the templates using metadata-acquisition forms to create collections of experimental metadata (center panel). The metadata are both stored in a CEDAR metadata repository (right panel) and exported along with the primary data to archives such as ImmPort, GEO, and the Stanford Digital Repository. Analysis of the CEDAR metadata repository (right panel) will reveal patterns in the metadata that will enable the tools for metadata acquisition (center panel) to use predictive data entry to ease the task of filling out the templates.
Figure 2:
Figure 2:
HIPC metadata template. The Human Immunology Project Consortium creates templates such as this one (for annotating the results of multiplex bead array assays) to standardize all its experimental metadata. HIPC templates are providing the initial test of the CEDAR template-management technology.
Figure 3:
Figure 3:
Prototype user interface for template selection and instantiation. Here, the end user has selected the “ImmPort Basic Study Design” template, and she has filled in values for the template’s slots for brief title, description, study type, and condition studied. The enumerated value sets for slots such as “study type” are taken from ontologies stored in the NCBO BioPortal repository.

Similar articles

Cited by

  • An ontology-driven tool for structured data acquisition using Web forms.
    Gonçalves RS, Tu SW, Nyulas CI, Tierney MJ, Musen MA. Gonçalves RS, et al. J Biomed Semantics. 2017 Aug 1;8(1):26. doi: 10.1186/s13326-017-0133-1. J Biomed Semantics. 2017. PMID: 28764813 Free PMC article.
  • Developing a healthcare dataset information resource (DIR) based on Semantic Web.
    Shi J, Zheng M, Yao L, Ge Y. Shi J, et al. BMC Med Genomics. 2018 Nov 20;11(Suppl 5):102. doi: 10.1186/s12920-018-0411-5. BMC Med Genomics. 2018. PMID: 30453940 Free PMC article.
  • Unleashing the value of Common Data Elements through the CEDAR Workbench.
    O'Connor MJ, Warzel DB, Martínez-Romero M, Hardi J, Willrett D, Egyedi AL, Eftekhari A, Graybeal J, Musen MA. O'Connor MJ, et al. AMIA Annu Symp Proc. 2020 Mar 4;2019:681-690. eCollection 2019. AMIA Annu Symp Proc. 2020. PMID: 32308863 Free PMC article.
  • BioHackathon 2015: Semantics of data for life sciences and reproducible research.
    Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. Vos RA, et al. F1000Res. 2020 Feb 24;9:136. doi: 10.12688/f1000research.18236.1. eCollection 2020. F1000Res. 2020. PMID: 32308977 Free PMC article.
  • FAIR-EuMon: a FAIR-enabling resource for biodiversity monitoring schemes.
    Menger J, Magagna B, Henle K, Harpke A, Frenzel M, Rick J, Wiltshire K, Grimm-Seyfarth A. Menger J, et al. Biodivers Data J. 2024 Aug 1;12:e125132. doi: 10.3897/BDJ.12.e125132. eCollection 2024. Biodivers Data J. 2024. PMID: 39131439 Free PMC article.

References

    1. Borgman CL. The conundrum of sharing research data. J Am Soc Inform Sci Technol. 2012;63(6):1059–1078.
    1. Global Alliance for Genomics & Health. http://genomicsandhealth.org. Accessed March 23, 2015.
    1. FORCE11. The future of research communications and e-scholarship. https://www.force11.org. Accessed March 23, 2015.
    1. Research Data Alliance: research data sharing without barriers. https://rd-alliance.org. Accessed March 23, 2015.
    1. Yarmey L, Baker KS. Towards standardization: a participatory framework for scientific standard-making. Int J Digit Curation. 2013;8(1):157–172.

Publication types