Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;58(1):105-117.
doi: 10.1111/medu.15190. Epub 2023 Aug 24.

Validity evidence supporting clinical skills assessment by artificial intelligence compared with trained clinician raters

Affiliations

Validity evidence supporting clinical skills assessment by artificial intelligence compared with trained clinician raters

Vilma Johnsson et al. Med Educ. 2024 Jan.

Abstract

Background: Artificial intelligence (AI) is becoming increasingly used in medical education, but our understanding of the validity of AI-based assessments (AIBA) as compared with traditional clinical expert-based assessments (EBA) is limited. In this study, the authors aimed to compare and contrast the validity evidence for the assessment of a complex clinical skill based on scores generated from an AI and trained clinical experts, respectively.

Methods: The study was conducted between September 2020 to October 2022. The authors used Kane's validity framework to prioritise and organise their evidence according to the four inferences: scoring, generalisation, extrapolation and implications. The context of the study was chorionic villus sampling performed within the simulated setting. AIBA and EBA were used to evaluate performances of experts, intermediates and novice based on video recordings. The clinical experts used a scoring instrument developed in a previous international consensus study. The AI used convolutional neural networks for capturing features on video recordings, motion tracking and eye movements to arrive at a final composite score.

Results: A total of 45 individuals participated in the study (22 novices, 12 intermediates and 11 experts). The authors demonstrated validity evidence for scoring, generalisation, extrapolation and implications for both EBA and AIBA. The plausibility of assumptions related to scoring, evidence of reproducibility and relation to different training levels was examined. Issues relating to construct underrepresentation, lack of explainability, and threats to robustness were identified as potential weak links in the AIBA validity argument compared with the EBA validity argument.

Conclusion: There were weak links in the use of AIBA compared with EBA, mainly in their representation of the underlying construct but also regarding their explainability and ability to transfer to other datasets. However, combining AI and clinical expert-based assessments may offer complementary benefits, which is a promising subject for future research.

PubMed Disclaimer

References

REFERENCES

    1. Hatala R, Cook DA, Brydges R, Hawkins R. Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence. Adv Health Sci Educ Theory Pract. 2015;20(5):1149-1175. doi:10.1007/s10459-015-9593-1
    1. Jin A, Yeung S, Jopling J, et al. Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018:691-699. 10.1109/WACV.2018.00081
    1. Khalid S, Goldenberg M, Grantcharov T, Taati B, Rudzicz F. Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw Open. 2020;3(3):e201664. doi:10.1001/jamanetworkopen.2020.1664
    1. Lavanchy JL, Zindel J, Kirtac K, et al. Automation of surgical skill assessment using a three-stage machine learning algorithm. Sci Rep. 2021;11(1):5197. doi:10.1038/s41598-021-84295-6
    1. Masters K. Artificial intelligence in medical education. Med Teach. 2019;41(9):976-980. doi:10.1080/0142159X.2019.1595557

Publication types

LinkOut - more resources