Development of a Google-based search engine for data mining radiology reports
- PMID: 18392657
- PMCID: PMC3043709
- DOI: 10.1007/s10278-008-9110-7
Development of a Google-based search engine for data mining radiology reports
Abstract
The aim of this study is to develop a secure, Google-based data-mining tool for radiology reports using free and open source technologies and to explore its use within an academic radiology department. A Health Insurance Portability and Accountability Act (HIPAA)-compliant data repository, search engine and user interface were created to facilitate treatment, operations, and reviews preparatory to research. The Institutional Review Board waived review of the project, and informed consent was not required. Comprising 7.9 GB of disk space, 2.9 million text reports were downloaded from our radiology information system to a fileserver. Extensible markup language (XML) representations of the reports were indexed using Google Desktop Enterprise search engine software. A hypertext markup language (HTML) form allowed users to submit queries to Google Desktop, and Google's XML response was interpreted by a practical extraction and report language (PERL) script, presenting ranked results in a web browser window. The query, reason for search, results, and documents visited were logged to maintain HIPAA compliance. Indexing averaged approximately 25,000 reports per hour. Keyword search of a common term like "pneumothorax" yielded the first ten most relevant results of 705,550 total results in 1.36 s. Keyword search of a rare term like "hemangioendothelioma" yielded the first ten most relevant results of 167 total results in 0.23 s; retrieval of all 167 results took 0.26 s. Data mining tools for radiology reports will improve the productivity of academic radiologists in clinical, educational, research, and administrative tasks. By leveraging existing knowledge of Google's interface, radiologists can quickly perform useful searches.
Figures



Similar articles
-
Intelligent image retrieval based on radiology reports.Eur Radiol. 2012 Dec;22(12):2750-8. doi: 10.1007/s00330-012-2608-x. Epub 2012 Aug 4. Eur Radiol. 2012. PMID: 22865274
-
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.J Biomed Inform. 2015 Aug;56:57-64. doi: 10.1016/j.jbi.2015.04.013. Epub 2015 May 19. J Biomed Inform. 2015. PMID: 26002820
-
A practical approach for inexpensive searches of radiology report databases.Acad Radiol. 2007 Jun;14(6):749-56. doi: 10.1016/j.acra.2007.02.008. Acad Radiol. 2007. PMID: 17502263
-
Potential use of extensible markup language for radiology reporting: a tutorial.Radiographics. 2000 Jan-Feb;20(1):287-93. doi: 10.1148/radiographics.20.1.g00ja28287. Radiographics. 2000. PMID: 10682794 Review.
-
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.J Digit Imaging. 2015 Oct;28(5):537-46. doi: 10.1007/s10278-015-9792-6. J Digit Imaging. 2015. PMID: 25810317 Free PMC article. Review.
Cited by
-
Intelligent image retrieval based on radiology reports.Eur Radiol. 2012 Dec;22(12):2750-8. doi: 10.1007/s00330-012-2608-x. Epub 2012 Aug 4. Eur Radiol. 2012. PMID: 22865274
-
Google Medical Update: Why Is the Search Engine Decreasing Visibility of Health and Medical Information Websites?Int J Environ Res Public Health. 2020 Feb 12;17(4):1160. doi: 10.3390/ijerph17041160. Int J Environ Res Public Health. 2020. PMID: 32059576 Free PMC article.
-
Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines.J Assoc Inf Sci Technol. 2019 May;70(5):419-432. doi: 10.1002/asi.24165. Epub 2019 Mar 12. J Assoc Inf Sci Technol. 2019. PMID: 31763358 Free PMC article. Review.
-
An information retrieval system for computerized patient records in the context of a daily hospital practice: the example of the Léon Bérard Cancer Center (France).Appl Clin Inform. 2014 Mar 5;5(1):191-205. doi: 10.4338/ACI-2013-08-CR-0065. eCollection 2014. Appl Clin Inform. 2014. PMID: 24734133 Free PMC article.
-
Automated measurement of pediatric cranial bone thickness and density from clinical computed tomography.Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:4462-5. doi: 10.1109/EMBC.2012.6346957. Annu Int Conf IEEE Eng Med Biol Soc. 2012. PMID: 23366918 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources