Structuring and centralizing breast cancer real-world biomarker data from pathology reports through C-LAB® artificial intelligence platform
- PMID: 40013074
- PMCID: PMC11863259
- DOI: 10.1177/20552076251323110
Structuring and centralizing breast cancer real-world biomarker data from pathology reports through C-LAB® artificial intelligence platform
Abstract
Purpose: To evaluate the effectiveness of C-LAB®, an artificial intelligence (AI) platform, in extracting, structuring, and centralizing biomarker data from breast cancer pathology reports within the challenging, heterogeneous dataset of the Institut de Cancérologie de l'Ouest (ICO).
Methods: C-LAB® was deployed at the ICO to analyze HER2 and hormonal receptor data from breast cancer pathology reports. During the development phase, 292 anatomic pathology reports were used to design and refine the rule-based extraction algorithm through an iterative process of monitoring and adjustments. After finalizing the algorithm, it was applied to a total of 2323 anatomic pathology reports. To evaluate the platform's accuracy, performance metrics could only be calculated for a subset of these reports that were also available in the structured National Epidemiological Strategy and Medical Economics (ESME) database. Out of the 2323 pathology reports belonging to 487 patients analyzed by C-LAB®, 666 corresponded to 97 patients present in the ESME database. These reports were used as the gold standard for performance assessment, as ESME provides structured data against which the outputs of the C-LAB® algorithm could be compared.
Results: C-LAB® achieved over 80% agreement with human extractions (precision, recall, and F1-score) in structuring biomarker data from complex, unstructured pathology reports, despite dataset variability and optical character recognition errors. While the ESME database served as a benchmark, its reliance on single manual data entry without secondary review introduces potential inaccuracies, suggesting the observed performance reflects close alignment between human and algorithmic extractions rather than absolute accuracy. C-LAB® demonstrates significant potential to reduce manual workload, centralize data, and enable scalable, real-time reporting.
Conclusion: AI technologies like C-LAB® show significant potential in creating accessible and actionable digital factories from complex pathology data, aiding in the precision management of diseases such as breast cancer diagnostics and treatment.
Keywords: Cancer disease; artificial intelligence general; biomarker; genetics medicine; machine learning general; oncology medicine.
© The Author(s) 2025.
Conflict of interest statement
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: FLB, CM, TP, OK, JSF, JR, MC and FB declare no conflict of interest. CG, YN, YE, JM, JC, MFP declare a potential conflict of interest as they were involved in both the development of the C-LAB® platform and the authorship of this study, as consultants or employees of Connect by Circular-Lab. However, every effort was made to ensure the integrity and objectivity of the research, including adherence to rigorous evaluation protocols. Note that none of these people had access to the ESME data and that the evaluation of the algorithm's performance was carried out by FLB, which has no connection with Connect by Circular-Lab.
Figures
References
-
- Jameson JL, Longo DL. Precision medicine–personalized, problematic, and promising. N Engl J Med 4 juin 2015; 372: 2229–2234. - PubMed
-
- Harbeck N, Gnant M. Breast cancer. Lancet 18 mars 2017; 389: 1134–1150. - PubMed
-
- Kunte S, Abraham J, Montero AJ. Novel HER2-targeted therapies for HER2-positive metastatic breast cancer. Cancer 1 oct 2020; 126: 4278–4288. - PubMed
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
