Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 20:25:100194.
doi: 10.1016/j.tipsro.2022.12.001. eCollection 2023 Mar.

Automated data extraction tool (DET) for external applications in radiotherapy

Affiliations

Automated data extraction tool (DET) for external applications in radiotherapy

Mruga Gurjar et al. Tech Innov Patient Support Radiat Oncol. .

Abstract

Oncological Information Systems (OIS) manage information in radiotherapy (RT) departments. Due to database structure limitations, stored information can rarely be directly used except for vendor-specific purposes. Our aim is to enable the use of such data in various external applications by creating a tool for automatic data extraction, cleaning and formatting.

Methods and materials: We used OIS data from a nine-linac RT department in Sweden (70 weeks, 2015-16). Extracted data included patients' referrals and appointments with details for RT sub-tasks. The data extraction tool to prepare the data for external use was built in C# programming language. It used excel-automation queries to remove unassigned/duplicated values, substitute missing data and perform application-specific calculations. Descriptive statistics were used to verify the output with the manually prepared dataset from the corresponding time period.

Results: From the initial raw data, 2030 (51 %)/907 (23 %) patients had known curative and palliative treatment intent for 84 different cancer diagnoses. After removal of incomplete entries, 373 (10 %) patients had unknown treatment intents which were substituted based on the known curative/palliative ratio. Automatically- and manuallyprepared datasets differed < 1 % for Mould, Treatment planning, Quality assurance and ± 5 % for Fractions and Magnetic resonance imaging with overestimations in 80/140 (57 %) entries by the tool.

Conclusion: We successfully implemented a software tool to prepare ready-to-use OIS datasets for external applications. Our evaluations showed overall results close to the manually-prepared dataset. The time taken to prepare the dataset using our automated strategy can reduce the time for manual preparation from weeks to seconds.

Keywords: Automation; Data cleaning; Data extraction; Radiotherapy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Data Extraction Tool (DET) process. Stepwise automation of extraction, cleaning and formatting of data from the ARIA oncology information system (OIS) to fit the input data format of the here investigated example external application. Details of calculation statistics are given in the results.
Fig. 2
Fig. 2
Extraction tool graphical user interface. Dates are selected using the date picker at the top of the window. Clicking Referrals extracts all referral data for the selected time period presented in the white boxes. Similarly, selecting a sub-task will result in an extraction of all appointment data related to the sub-task. Abbreviations: CT = computed tomography, CT-site1 = CT at main department, CT-site2 = CT at satellite department, DET = Data Extraction Tool, MR = magnetic resonance imaging, PET = positron emission tomography, FRAC = fractions, QA = quality assurance.
Fig. 3
Fig. 3
Simulation results for radiotherapy preparation and treatment steps during an eight-week summer vacation period for the radiotherapy department at the Sahlgrenska University Hospital in Sweden. a. manually-cleaned reference dataset for 2015–16, b. automatically-cleaned tool dataset for 2015–16, and c. automatically-cleaned tool dataset for 2020–21. The data here represents patients from the largest 20 diagnosis and intent groups corresponding to 80% of all patient data.
Fig. 4
Fig. 4
Yearly referral inflow pattern at the radiotherapy department at the Sahlgrenska University Hospital in Sweden for the investigated manually-cleaned and automatically-cleaned patient datasets. The data here represents all patient data from 2015 to 16 and 2020–21 with 84 different diagnoses. Abbreviation: DET = Data Extraction Tool.

References

    1. Vieira B., Hans E.W., van Vliet-Vroegindeweij C., van de Kamer J., van Harten W. Operations research for resource planning and -use in radiotherapy: a literature review. BMC Med Inform Decis Mak. 2016;16(1):149. doi: 10.1186/s12911-016-0390-4. - DOI - PMC - PubMed
    1. McNutt T.R., Bowers M., Cheng Z., Han P., Hui X., Moore J., et al. Practical data collection and extraction for big data applications in radiotherapy. Med Phys. 2018;45(10):e863–e869. doi: 10.1002/mp.12817. - DOI - PubMed
    1. Babashov V., Aivas I., Begen M.A., Cao J.Q., Rodrigues G., D'Souza D., et al. Reducing Patient Waiting Times for Radiation Therapy and Improving the Treatment Planning Process: a Discrete-event Simulation Model (Radiation Treatment Planning) Clin Oncol (R Coll Radiol) 2017;29(6):385–391. doi: 10.1016/j.clon.2017.01.039. - DOI - PubMed
    1. Lindberg J., Gurjar M., Holmstrom P., Hallberg S., Bjork-Eriksson T., Olsson C.E. Resource planning principles for the radiotherapy process using simulations applied to a longer vacation period use case. Tech Innov Patient Support Radiat Oncol. 2021;20:17–22. doi: 10.1016/j.tipsro.2021.10.001. - DOI - PMC - PubMed
    1. da Silva R.B.Z., Fogliatto F.S., Krindges A., dos Santos C.M. Dynamic capacity allocation in a radiology service considering different types of patients, individual no-show probabilities, and overbooking. BMC Health Serv Res. 2021;21(1):968. doi: 10.1186/s12913-021-06918-y. - DOI - PMC - PubMed

LinkOut - more resources