Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug:5:833-841.
doi: 10.1200/CCI.21.00017.

Customizable Natural Language Processing Biomarker Extraction Tool

Affiliations

Customizable Natural Language Processing Biomarker Extraction Tool

Benjamin Holmes et al. JCO Clin Cancer Inform. 2021 Aug.

Abstract

Purpose: Natural language processing (NLP) in pathology reports to extract biomarker information is an ongoing area of research. MetaMap is a natural language processing tool developed and funded by the National Library of Medicine to map biomedical text to the Unified Medical Language System Metathesaurus by applying specific tags to clinically relevant terms. Although results are useful without additional postprocessing, these tags lack important contextual information.

Methods: Our novel method takes terminology-driven semantic tags and incorporates those into a semantic frame that is task-specific to add necessary context to MetaMap. We use important contextual information to capture biomarker results to support Community Health System's use of Precision Medicine treatments for patients with cancer. For each biomarker, the name, type, numeric quantifiers, non-numeric qualifiers, and the time frame are extracted. These fields then associate biomarkers with their context in the pathology report such as test type, probe intensity, copy-number changes, and even failed results. A selection of 6,713 relevant reports contained the following standard-of-care biomarkers for metastatic breast cancer: breast cancer gene 1 and 2, estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and programmed death-ligand 1.

Results: The method was tested on pathology reports from the internal pathology laboratory at Henry Ford Health System. A certified tumor registrar reviewed 400 tests, which showed > 95% accuracy for all extracted biomarker types.

Conclusion: Using this new method, it is possible to extract high-quality, contextual biomarker information, and this represents a significant advance in biomarker extraction.

PubMed Disclaimer

Conflict of interest statement

Benjamin HolmesEmployment: SyapseStock and Other Ownership Interests: Syapse Joshua LovingEmployment: Syapse, Philips ResearchStock and Other Ownership Interests: Philips Healthcare, SyapsePatents, Royalties, Other Intellectual Property: Patents pending from Philips Research Mary TranEmployment: SyapseStock and Other Ownership Interests: Syapse Vinod SubramanianEmployment: SyapseLeadership: SyapseStock and Other Ownership Interests: SyapseTravel, Accommodations, Expenses: Syapse Anna BerryEmployment: SyapseStock and Other Ownership Interests: SyapseResearch Funding: Tempus Matthew RiothEmployment: SyapseStock and Other Ownership Interests: Syapse Thomas BrownEmployment: GenomiCare Biotechnology, SyapseLeadership: SyapseStock and Other Ownership Interests: GenomiCare Biotechnology, Syapse, SygnomicsHonoraria: NovartisConsulting or Advisory Role: Jiahui Health, GenomiCare Biotechnology, Lug Healthcare Technology, SyapseSpeakers' Bureau: Syapse, Novartis, Precision MedicineTravel, Accommodations, Expenses: Syapse, Jiahui Health, GenomiCare, Lug Healthcare Technology, NovartisNo other potential conflicts of interest were reported.