Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 1;49(4):1067-1074.
doi: 10.1093/ije/dyaa087.

Recognizing, reporting and reducing the data curation debt of cohort studies

Affiliations

Recognizing, reporting and reducing the data curation debt of cohort studies

Oliver W Butters et al. Int J Epidemiol. .

Abstract

Good data curation is integral to cohort studies, but it is not always done to a level necessary to ensure the longevity of the data a study holds. In this opinion paper, we introduce the concept of data curation debt-the data curation equivalent to the software engineering principle of technical debt. Using the context of UK cohort studies, we define data curation debt-describing examples and their potential impact. We highlight that accruing this debt can make it more difficult to use the data in the future. Additionally, the long-running nature of cohort studies means that interest is accrued on this debt and compounded over time-increasing the impact a debt could have on a study and its stakeholders. Primary causes of data curation debt are discussed across three categories: longevity of hardware, software and data formats; funding; and skills shortages. Based on cross-domain best practice, strategies to reduce the debt and preventive measures are proposed-with importance given to the recognition and transparent reporting of data curation debt. Describing the debt in this way, we encapsulate a multi-faceted issue in simple terms understandable by all cohort study stakeholders. Data curation debt is not only confined to the UK, but is an issue the international community must be aware of and address. This paper aims to stimulate a discussion between cohort studies and their stakeholders on how to address the issue of data curation debt. If data curation debt is left unchecked it could become impossible to use highly valued cohort study data, and ultimately represents an existential risk to studies themselves.

Keywords: Data curation; cohort studies; data management.

PubMed Disclaimer

References

    1. Kruchten P, Nord RL, Ozkaya I. Technical debt: from metaphor to theory and practice. IEEE Softw 2012;29:18–21.
    1. Cunningham W. The WyCash portfolio management system. Sigplan Oops Mess 1993;4:29–30.
    1. Medical Research Council. Maximising the Value of UK Population Cohorts. 2014. https://mrc.ukri.org/publications/browse/maximising-the-value-of-uk-popu... (20 December 2019, date last accessed).
    1. Kuh D, Pierce M, Adams J et al. Cohort Profile: Updating the cohort profile for the MRC National Survey of Health and Development: a new clinic-based data collection for ageing research. Int J Epidemiol 2011;40:e1–9. - PMC - PubMed
    1. Wilkinson MD, Dumontier M, Aalbersberg IJ et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016;3:160018. - PMC - PubMed

Publication types