Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 22:9:e53928.
doi: 10.2196/53928.

Discrimination of Radiologists' Experience Level Using Eye-Tracking Technology and Machine Learning: Case Study

Affiliations

Discrimination of Radiologists' Experience Level Using Eye-Tracking Technology and Machine Learning: Case Study

Stanford Martinez et al. JMIR Form Res. .

Abstract

Background: Perception-related errors comprise most diagnostic mistakes in radiology. To mitigate this problem, radiologists use personalized and high-dimensional visual search strategies, otherwise known as search patterns. Qualitative descriptions of these search patterns, which involve the physician verbalizing or annotating the order he or she analyzes the image, can be unreliable due to discrepancies in what is reported versus the actual visual patterns. This discrepancy can interfere with quality improvement interventions and negatively impact patient care.

Objective: The objective of this study is to provide an alternative method for distinguishing between radiologists by means of captured eye-tracking data such that the raw gaze (or processed fixation data) can be used to discriminate users based on subconscious behavior in visual inspection.

Methods: We present a novel discretized feature encoding based on spatiotemporal binning of fixation data for efficient geometric alignment and temporal ordering of eye movement when reading chest x-rays. The encoded features of the eye-fixation data are used by machine learning classifiers to discriminate between faculty and trainee radiologists. A clinical trial case study was conducted using metrics such as the area under the curve, accuracy, F1-score, sensitivity, and specificity to evaluate the discriminability between the 2 groups regarding their level of experience. The classification performance was then compared with state-of-the-art methodologies. In addition, a repeatability experiment using a separate dataset, experimental protocol, and eye tracker was performed with 8 participants to evaluate the robustness of the proposed approach.

Results: The numerical results from both experiments demonstrate that classifiers using the proposed feature encoding methods outperform the current state-of-the-art in differentiating between radiologists in terms of experience level. An average performance gain of 6.9% is observed compared with traditional features while classifying experience levels of radiologists. This gain in accuracy is also substantial across different eye tracker-collected datasets, with improvements of 6.41% using the Tobii eye tracker and 7.29% using the EyeLink eye tracker. These results signify the potential impact of the proposed method for identifying radiologists' level of expertise and those who would benefit from additional training.

Conclusions: The effectiveness of the proposed spatiotemporal discretization approach, validated across diverse datasets and various classification metrics, underscores its potential for objective evaluation, informing targeted interventions and training strategies in radiology. This research advances reliable assessment tools, addressing challenges in perception-related errors to enhance patient care outcomes.

Keywords: classification; education; experience; experience level determination; eye movement; eye-tracking; fixation; gaze; image; machine learning; radiology; radiology education; search pattern; search pattern feature extraction; spatio-temporal; x-ray.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: KC holds equity in Zauron Labs Inc and NC is a co-owner of Zauron Labs.

Figures

Figure 1
Figure 1
Overall algorithm: the steps required to generate proposed features from the raw dataset and build the proposed machine learning model.
Figure 2
Figure 2
Example of eye-tracking fixations for 1 trial processed by the EyeLink software. The fixations illustrated include participants 1 (blue) and 2 (red) superimposed on the image displayed during the trial. The “invalid” fixations that were not successfully filtered out are shown as “x” markers and were manually removed during data processing.
Figure 3
Figure 3
Proposed discretized vector encoding for fixation data. Bins 1, 2, and 3 capture fixations in a preserved spatial dimension across different temporal windows. Each row represents a temporal bin, and within each bin, the chest x-ray image is divided into spatial grids. The fixations are counted within each grid cell, providing a detailed representation of the radiologist’s visual search pattern over time.
Figure 4
Figure 4
Numerical study results on the area under the curve metric reported for each classifier when consuming the EyeLink dataset, organized by the aggregated average of classifier, data type, and select feature extraction levels using the original dataset, principal component analysis (PCA), and kernel principal component analysis (KPCA). Alex: AlexNet-like neural network classifier; Average: average of all classifiers; GP: Gaussian process; KNN: k-nearest neighbors; LR: logistic regression; XGBoost: extreme gradient boosting;.
Figure 5
Figure 5
Scan of chest x-ray by participant 1 (faculty).
Figure 6
Figure 6
Scan of chest x-ray by participant 2 (trainee).
Figure 7
Figure 7
Numerical study results on the area under the curve metric reported for each classifier when consuming the Tobii dataset, organized by the aggregated average of classifier, data type, and selected feature extraction levels using the original dataset, principal component analysis (PCA), and kernel principal component analysis (KPCA). Alex: AlexNet-like neural network classifier; Average: average of all classifiers; GP: Gaussian process; KNN: k-nearest neighbors; LR: logistic regression; XGBoost: extreme gradient boosting;.

Similar articles

References

    1. An update on cancer deaths in the United States. Centers for Disease Control and Prevention. 2022. Feb 28, [2022-02-28]. https://stacks.cdc.gov/view/cdc/119728 .
    1. Screening for lung caner: U.S. Preventitive Services Task Force recommendation statement. Lung Cancer Research foundation. 2014. [2024-09-23]. https://www.lungcancerresearchfoundation.org/screening-for-lung-cancer-u...
    1. Waite S, Grigorian A, Alexander RG, Macknik SL, Carrasco M, Heeger DJ, Martinez-Conde S. Analysis of perceptual expertise in radiology - current knowledge and a new perspective. Front Hum Neurosci. 2019;13:213. doi: 10.3389/fnhum.2019.00213. https://europepmc.org/abstract/MED/31293407 - DOI - PMC - PubMed
    1. Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics. 2015;35(6):1668–1676. doi: 10.1148/rg.2015150023. - DOI - PubMed
    1. Kundel HL. Perception errors in chest radiography. Seminars in Respiratory Medicine. 1989;10(03):203–210. doi: 10.1055/s-2007-1006173. - DOI

MeSH terms

LinkOut - more resources