CADRE: A Collaborative, Cloud-Based Solution for Big Bibliographic Data Research in Academic Libraries
- PMID: 33693415
- PMCID: PMC7931882
- DOI: 10.3389/fdata.2020.556282
CADRE: A Collaborative, Cloud-Based Solution for Big Bibliographic Data Research in Academic Libraries
Abstract
Big bibliographic datasets hold promise for revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Yet, hosting proprietary and open big bibliographic datasets poses significant difficulties for libraries, both large and small. Libraries face significant barriers to hosting such assets, including cost and expertise, which has limited their ability to provide stewardship for big datasets, and thus has hampered researchers' access to them. What is needed is a solution to address the libraries' and researchers' joint needs. This article outlines the theoretical framework that underpins the Collaborative Archive and Data Research Environment project. We recommend a shared cloud-based infrastructure to address this need built on five pillars: 1) Community-a community of libraries and industry partners who support and maintain the platform and a community of researchers who use it; 2) Access-the sharing platform should be accessible and affordable to both proprietary data customers and the general public; 3) Data-Centric-the platform is optimized for efficient and high-quality bibliographic data services, satisfying diverse data needs; 4) Reproducibility-the platform should be designed to foster and encourage reproducible research; 5) Empowerment-the platform should empower researchers to perform big data analytics on the hosted datasets. In this article, we describe the many facets of the problem faced by American academic libraries and researchers wanting to work with big datasets. We propose a practical solution based on the five pillars: The Collaborative Archive and Data Research Environment. Finally, we address potential barriers to implementing this solution and strategies for overcoming them.
Keywords: bibliographic big data; bibliographic research resource; libraries; open access; platform-as-a-service; reproducibility.
Copyright © 2020 Mabry, Yan, Pentchev, Van Rennes, McGavin and Wittenberg.
Conflict of interest statement
The authors wish to declare a potential conflict of interest with one of the journal Associate Editors, Kuansan Wang. Kuansan helped secure funding from his employer, Microsoft Research. The funds were used to provide travel scholarships to CADRE Fellows to attend and present on their CADRErelated work at scientific meetings.
Figures
References
-
- Angles R. (2012). “A comparison of current graph database models,” in 2012 IEEE 28th International Conference on Data Engineering Workshops (IEEE; ) Arlington, VA (New York, NY), 171–177. 10.1109/ICDEW.2012.31 - DOI
-
- Arp L., Clareson T., Egan C. (2020). Data curation network sustainability plan final report. Data Curation Network. Available at: https://conservancy.umn.edu/bitstream/handle/11299/211865/DCN%20Sustaina... (Accessed October 18, 2020)
-
- Birkle C., Pendlebury D. A., Schnell J., Adams J. (2020). Web of science as a data source for research on scientific and scholarly activity. Quant. Sci. Studies. 1 (1), 363–376. 10.1162/qss_a_00018 - DOI
-
- Christenson H. (2011). HathiTrust. Chicago, IL: Library Resources & Technical Services, Vol. 55, 93–102. 10.5860/lrts.55n2.93 - DOI
LinkOut - more resources
Full Text Sources