Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 16:16:100411.
doi: 10.1016/j.jpi.2024.100411. eCollection 2025 Jan.

Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations

Affiliations

Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations

Victor Garcia et al. J Pathol Inform. .

Abstract

Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC).

Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization.

Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements.

Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.

Keywords: Data; Prioritization; Sampling; Validation.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Emma Gardecki reports financial support was provided by Oak Ridge Institute for Science and Education. Stephanie Jou reports financial support was provided by FDA's Office of Women's Health. Brandon Gallas reports financial support was provided by FDA Office of Women's Health. Balazs Acs reports financial support was provided by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning). Xiaoxian Li reports a relationship with Astra Zeneca that includes:. Xiaoxian Li reports a relationship with Roche that includes:. Xiaoxian Li reports a relationship with Eli Lilly that includes:. Xiaoxian Li reports a relationship with Onviv that includes:. Xiaoxian Li reports a relationship with Champions Oncology that includes: funding grants. Joel Saltz reports a relationship with National Cancer Institute that includes: funding grants. Joel Saltz reports a relationship with Chilean Wool that includes:. Roberto Salgado reports a relationship with Merck that includes:. Roberto Salgado reports a relationship with Case 45 that includes:. Roberto Salgado reports a relationship with Bristol Myers Squibb that includes: consulting or advisory. Roberto Salgado reports a relationship with Puma Biotechnology that includes:. Roberto Salgado reports a relationship with Roche that includes: consulting or advisory. Roberto Salgado reports a relationship with Astra Zeneca that includes: consulting or advisory. Roberto Salgado reports a relationship with Daicchii Sankyo that includes: consulting or advisory. Roberto Salgado reports a relationship with Exact Sciences that includes: consulting or advisory. Kenneth Shroyer reports a relationship with KDx Diagnostics that includes: The following co-authors are members of the National Editorial Board of the Journal of Pathology Informatics: Joel Saltz and Jochen Lennerz. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Figures

Fig. 1
Fig. 1
Institutional workflows for cohort identification. A. (Emory University Hospital) Patients identified through manual chart review beginning with an institutional breast cancer database. Provided database only included cases of HER2 0–1+ (negative) and HER2 2+ (equivocal). B. (Stony Brook University Hospital) Patients identified from Anatomic Pathology Database using natural language queries. The blue background denotes steps performed using the institution's pathology information system: CoPath used in both institutions. The yellow background denotes steps involving the electronic health record: Powerchart and Epic used at Emory. “n” refers to the count of unique patients. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Clinical metadata distributions. A. Age distribution (decade binned) by Race. B. Distribution of age (decade binned) by Breast Cancer Stage. C. Distribution of breast cancer stages by Race.
Fig. 3
Fig. 3
Results of hierarchical rank-sort method. A) Number of ROIs within each sTILs density bin for all 55 WSIs. Blue indicates ROIs included in the pivotal study batches by the hierarchical rank-sort method. Hashed regions represent all ROIs within that sTILs density bin including those not included in the pivotal study batches. B) Distribution of sTILs densities bins across pivotal study batches after sorting. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4
Fig. 4
Characteristics of selected cases (WSIs). This figure shows the unique distribution of the patient demographic features (age, race, and ethnicity) and disease stages included in the pivotal study batches by the hierarchical rank-sort method. Our approach retained the patients within the lowest frequency age bins and disease stage, all “Black or African American” patients (13/47), and all “Hispanic or Latino” patients (5/47).
Fig. 5
Fig. 5
Distribution of Pitfalls in the full set of ROIs selected and annotated and in the pivotal study batches selected by the hierarchical rank-sort method.

Similar articles

References

    1. FDA . 2017. Evaluation and Reporting of Age-, Race-, and Ethnicity-Specific Data in Medical Device Clinical Studies: Guidance for Industry and Food and Drug Administration Staff.
    1. Dudgeon S.N., Wen S., Hanna M.G., et al. A pathologist-annotated dataset for validating artificial intelligence: a project description and pilot study. J. Pathol. Inform. 2021;12:45. doi: 10.4103/jpi.jpi_83_20. - DOI - PMC - PubMed
    1. Elfer K., Dudgeon S., Garcia V., et al. Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms. J. Med. Imag. 2022:9. doi: 10.1117/1.JMI.9.4.047501. - DOI - PMC - PubMed
    1. Garcia V., Elfer K., Peeters D.J.E., et al. Development of training materials for pathologists to provide machine learning validation data of tumor-infiltrating lymphocytes in breast cancer. Cancers. 2022;14:2467. doi: 10.3390/cancers14102467. - DOI - PMC - PubMed
    1. Hart S., Garcia V., Dudgeon S.N., et al. Initial interactions with the FDA on developing a validation dataset as a medical device development tool. J. Pathol. 2023;261:378–384. doi: 10.1002/path.6208. - DOI - PMC - PubMed

LinkOut - more resources