Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
- PMID: 39720416
- PMCID: PMC11667696
- DOI: 10.1016/j.jpi.2024.100411
Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
Abstract
Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC).
Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization.
Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements.
Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.
Keywords: Data; Prioritization; Sampling; Validation.
Conflict of interest statement
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Emma Gardecki reports financial support was provided by Oak Ridge Institute for Science and Education. Stephanie Jou reports financial support was provided by FDA's Office of Women's Health. Brandon Gallas reports financial support was provided by FDA Office of Women's Health. Balazs Acs reports financial support was provided by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning). Xiaoxian Li reports a relationship with Astra Zeneca that includes:. Xiaoxian Li reports a relationship with Roche that includes:. Xiaoxian Li reports a relationship with Eli Lilly that includes:. Xiaoxian Li reports a relationship with Onviv that includes:. Xiaoxian Li reports a relationship with Champions Oncology that includes: funding grants. Joel Saltz reports a relationship with National Cancer Institute that includes: funding grants. Joel Saltz reports a relationship with Chilean Wool that includes:. Roberto Salgado reports a relationship with Merck that includes:. Roberto Salgado reports a relationship with Case 45 that includes:. Roberto Salgado reports a relationship with Bristol Myers Squibb that includes: consulting or advisory. Roberto Salgado reports a relationship with Puma Biotechnology that includes:. Roberto Salgado reports a relationship with Roche that includes: consulting or advisory. Roberto Salgado reports a relationship with Astra Zeneca that includes: consulting or advisory. Roberto Salgado reports a relationship with Daicchii Sankyo that includes: consulting or advisory. Roberto Salgado reports a relationship with Exact Sciences that includes: consulting or advisory. Kenneth Shroyer reports a relationship with KDx Diagnostics that includes: The following co-authors are members of the National Editorial Board of the Journal of Pathology Informatics: Joel Saltz and Jochen Lennerz. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Figures





Similar articles
-
Development of Training Materials for Pathologists to Provide Machine Learning Validation Data of Tumor-Infiltrating Lymphocytes in Breast Cancer.Cancers (Basel). 2022 May 17;14(10):2467. doi: 10.3390/cancers14102467. Cancers (Basel). 2022. PMID: 35626070 Free PMC article.
-
A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study.J Pathol Inform. 2021 Nov 15;12:45. doi: 10.4103/jpi.jpi_83_20. eCollection 2021. J Pathol Inform. 2021. PMID: 34881099 Free PMC article.
-
Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms.J Med Imaging (Bellingham). 2022 Jul;9(4):047501. doi: 10.1117/1.JMI.9.4.047501. Epub 2022 Jul 27. J Med Imaging (Bellingham). 2022. PMID: 35911208 Free PMC article.
-
Development and validation of artificial intelligence-based prescreening of large-bowel biopsies taken in the UK and Portugal: a retrospective cohort study.Lancet Digit Health. 2023 Nov;5(11):e786-e797. doi: 10.1016/S2589-7500(23)00148-6. Lancet Digit Health. 2023. PMID: 37890902
-
One label is all you need: Interpretable AI-enhanced histopathology for oncology.Semin Cancer Biol. 2023 Dec;97:70-85. doi: 10.1016/j.semcancer.2023.09.006. Epub 2023 Oct 11. Semin Cancer Biol. 2023. PMID: 37832751 Review.
References
-
- FDA . 2017. Evaluation and Reporting of Age-, Race-, and Ethnicity-Specific Data in Medical Device Clinical Studies: Guidance for Industry and Food and Drug Administration Staff.
LinkOut - more resources
Full Text Sources