The SAIL databank: linking multiple health and social care datasets
- PMID: 19149883
- PMCID: PMC2648953
- DOI: 10.1186/1472-6947-9-3
The SAIL databank: linking multiple health and social care datasets
Abstract
Background: Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.
Methods: Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique.
Results: The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care.
Conclusion: With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.
Figures

Similar articles
-
The SAIL Databank: building a national architecture for e-health research and evaluation.BMC Health Serv Res. 2009 Sep 4;9:157. doi: 10.1186/1472-6963-9-157. BMC Health Serv Res. 2009. PMID: 19732426 Free PMC article.
-
Data resource: Children receiving care and support and children in need, administrative records in Wales.Int J Popul Data Sci. 2022 May 9;7(1):1694. doi: 10.23889/ijpds.v7i1.1694. eCollection 2022. Int J Popul Data Sci. 2022. PMID: 35719716 Free PMC article.
-
Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications.Eur J Epidemiol. 2019 Jan;34(1):91-99. doi: 10.1007/s10654-018-0442-4. Epub 2018 Sep 15. Eur J Epidemiol. 2019. PMID: 30219957 Free PMC article.
-
Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France.Int J Environ Res Public Health. 2022 Apr 2;19(7):4272. doi: 10.3390/ijerph19074272. Int J Environ Res Public Health. 2022. PMID: 35409956 Free PMC article. Review.
-
Why and how we can use data linkage in oral health research: a narrative review.Community Dent Oral Epidemiol. 2023 Feb;51(1):75-78. doi: 10.1111/cdoe.12815. Epub 2023 Feb 7. Community Dent Oral Epidemiol. 2023. PMID: 36749677 Review.
Cited by
-
Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States.Ann N Y Acad Sci. 2017 Jan;1387(1):73-83. doi: 10.1111/nyas.13259. Epub 2016 Sep 28. Ann N Y Acad Sci. 2017. PMID: 27681358 Free PMC article. Review.
-
Weighting of risk factors for low birth weight: a linked routine data cohort study in Wales, UK.BMJ Open. 2023 Feb 10;13(2):e063836. doi: 10.1136/bmjopen-2022-063836. BMJ Open. 2023. PMID: 36764720 Free PMC article.
-
Identifying cerebral palsy from routinely-collected data in England and Wales.Clin Epidemiol. 2019 Jun 5;11:457-468. doi: 10.2147/CLEP.S200748. eCollection 2019. Clin Epidemiol. 2019. PMID: 31239784 Free PMC article.
-
Identifying Dynamic Patterns of Polypharmacy for Patients with Dementia from Primary Care Electronic Health Records: A Machine Learning Driven Longitudinal Study.Aging Dis. 2023 Apr 1;14(2):548-559. doi: 10.14336/AD.2022.0829. eCollection 2023 Apr 1. Aging Dis. 2023. PMID: 37008054 Free PMC article.
-
Paramedic-supplied 'Take Home' Naloxone: protocol for cluster randomised feasibility study.BMJ Open. 2014 Mar 20;4(3):e004712. doi: 10.1136/bmjopen-2013-004712. BMJ Open. 2014. PMID: 24650810 Free PMC article. Clinical Trial.
References
-
- World Health Organization (WHO) Building foundations for e-health: progress of member states. Geneva. 2006.
-
- UK Clinical Research Collaboration (UKCRC) Clinical Research in the UK: Towards a single system that reliably delivers distinctive quality and rapid access at reasonable cost (The McKinsey Report) 2005. http://www.ukcrc.org/publications/reports.aspx
-
- UKCRC. UKCRC Progress Report (2004 – 2006) 2006. http://www.ukcrc.org/publications/reports.aspx
-
- Health Solutions Wales. http://www.hsw.wales.nhs.uk/
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources