Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 4;5(1):170.
doi: 10.1038/s41746-022-00716-4.

Representational ethical model calibration

Affiliations

Representational ethical model calibration

Robert Carruthers et al. NPJ Digit Med. .

Abstract

Equity is widely held to be fundamental to the ethics of healthcare. In the context of clinical decision-making, it rests on the comparative fidelity of the intelligence - evidence-based or intuitive - guiding the management of each individual patient. Though brought to recent attention by the individuating power of contemporary machine learning, such epistemic equity arises in the context of any decision guidance, whether traditional or innovative. Yet no general framework for its quantification, let alone assurance, currently exists. Here we formulate epistemic equity in terms of model fidelity evaluated over learnt multidimensional representations of identity crafted to maximise the captured diversity of the population, introducing a comprehensive framework for Representational Ethical Model Calibration. We demonstrate the use of the framework on large-scale multimodal data from UK Biobank to derive diverse representations of the population, quantify model performance, and institute responsive remediation. We offer our approach as a principled solution to quantifying and assuring epistemic equity in healthcare, with applications across the research, clinical, and regulatory domains.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The Representational Ethical Model Calibration Framework.
The fidelity of a candidate model with respect to subpopulations identified by representation learning (performed on either primary or secondary data) is quantified in an ethical calibration step that informs appropriate remedial action, within an iterative process repeated until an agreed criterion of model equity is reached.
Fig. 2
Fig. 2. Diabetes prevalence by variable.
Higher prevalence was seen in males (a), smokers (b), those with high blood pressure (c), certain ethnicities (d), those with higher body fat % (e), and the more deprived (f).
Fig. 3
Fig. 3. Relationship of diabetes and glycated haemoglobin (HbA1c).
Those without diabetes tended to have HbA1c below the diagnosis threshold of 48, while those with diabetes had a wide range of HbA1c both above and below the threshold.
Fig. 4
Fig. 4. Two-dimensional latent space.
The space is coloured by data density (a), model error (b, c), and the values of selected variables (di). The space appears to be dominantly clustered by sex (d). The largest groups in the worst 25th performance percentile are shown in (c), associated with higher levels of HbA1c (i).
Fig. 5
Fig. 5. Model performance by GMM group in the latent space.
Performance is shown before (a) and after (c) remediation. The top panel of (a) shows model performance by group, while the bottom panel shows the group counts. The model showed mostly even performance across groups. The top panel of (b) shows the effect of remediation. Lower-performing (higher NRMSE) groups show improvements, but the better-performing groups got significantly worse. The bottom panel of (b) shows group counts in descending order of the original NRMSE. It can be seen that performance decreases occurred in high-volume groups. The performance distribution worsened overall, shown in (c), and this would likely offset any gain in equity.
Fig. 6
Fig. 6. Effect of remediation.
NRMSE is shown for the whole dataset, the base group and the under-served group, before and after remediation, over n = 10 trials, on training and validation data. Performance was worse on the under-served group, and this improved after rebalancing. However, there was a high cost in base group performance. See differences in Table 3. The boxplots show the median (centre line), 25th and 75th centiles (box), 1.5 times the interquartile range (whiskers), and outliers (diamonds).
Fig. 7
Fig. 7. Effect of upsampling multiplier on performance metrics.
Panel (a) shows NRMSE for the entire dataset, the base group and the under-served group, for training and validation sets. Panel (b) shows the Gini coefficient. In both training and validation sets, increasing the upsampling multiplier improved model performance on the under-served group, while negatively affecting performance on the base group and overall. The Gini coefficient tended to drop as upsampling increased, mostly indicating increased equity in the distribution for higher levels of upsampling.

References

    1. Sackett, D. L. Evidence-based medicine. In Seminars in perinatology, vol. 21, 3–5 (Elsevier, 1997). - PubMed
    1. Greenhalgh T, Howick J, Maskrey N. Evidence based medicine: a movement in crisis? Bmj. 2014;348:g3725. doi: 10.1136/bmj.g3725. - DOI - PMC - PubMed
    1. Crisp, R. Aristotle: Nicomachean Ethics (Cambridge University Press, 2014).
    1. Health equity. https://www.who.int/health-topics/health-equity. Accessed: 2022-08-13.
    1. Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 2018;25:1419–1428. doi: 10.1093/jamia/ocy068. - DOI - PMC - PubMed

LinkOut - more resources