Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring

Thomas J Steinbach¹, Debra A Tokarz¹, Caroll A Co², Shawn F Harris², Sandra J McBride², Keith R Shockley³, Avinash Lokhande⁴, Gargi Srivastava⁴, Rajesh Ugalmugle⁴, Arshad Kazi⁴, Emily Singletary¹, Mark F Cesta³, Heath C Thomas⁵, Vivian S Chen^{6

7}, Kristen Hobbie⁸, Torrie A Crabbs¹

Affiliations

¹ Experimental Pathology Laboratories, Inc., Research Triangle Park, North Carolina, USA.
² Social & Scientific Systems, Inc., Durham, North Carolina, USA.
³ National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
⁴ AIRA Matrix, Mumbai, India.
⁵ Aclairo Pharmaceutical Development Group, Vienna, Virginia, USA.
⁶ Charles River Laboratories, Durham, North Carolina, USA.
⁷ Biogen, Cambridge, Massachusetts, USA.
⁸ Inotiv, Research Triangle Park, North Carolina, USA.

PMID: 38907685
PMCID: PMC11412787
DOI: 10.1177/01926233241259998

Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring

Thomas J Steinbach et al. Toxicol Pathol. 2024 Jul.

. 2024 Jul;52(5):258-265.

doi: 10.1177/01926233241259998. Epub 2024 Jun 22.

Authors

Affiliations

¹ Experimental Pathology Laboratories, Inc., Research Triangle Park, North Carolina, USA.
² Social & Scientific Systems, Inc., Durham, North Carolina, USA.
³ National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
⁴ AIRA Matrix, Mumbai, India.
⁵ Aclairo Pharmaceutical Development Group, Vienna, Virginia, USA.
⁶ Charles River Laboratories, Durham, North Carolina, USA.
⁷ Biogen, Cambridge, Massachusetts, USA.
⁸ Inotiv, Research Triangle Park, North Carolina, USA.

PMID: 38907685
PMCID: PMC11412787
DOI: 10.1177/01926233241259998

Abstract

We previously developed a computer-assisted image analysis algorithm to detect and quantify the microscopic features of rodent progressive cardiomyopathy (PCM) in rat heart histologic sections and validated the results with a panel of five veterinary toxicologic pathologists using a multinomial logistic model. In this study, we assessed both the inter-rater and intra-rater agreement of the pathologists and compared pathologists' ratings to the artificial intelligence (AI)-predicted scores. Pathologists and the AI algorithm were presented with 500 slides of rodent heart. They quantified the amount of cardiomyopathy in each slide. A total of 200 of these slides were novel to this study, whereas 100 slides were intentionally selected for repetition from the previous study. After a washout period of more than six months, the repeated slides were examined to assess intra-rater agreement among pathologists. We found the intra-rater agreement to be substantial, with weighted Cohen's kappa values ranging from k = 0.64 to 0.80. Intra-rater variability is not a concern for the deterministic AI. The inter-rater agreement across pathologists was moderate (Cohen's kappa k = 0.56). These results demonstrate the utility of AI algorithms as a tool for pathologists to increase sensitivity and specificity for the histopathologic assessment of the heart in toxicology studies.

Keywords: Sprague Dawley; artificial intelligence; cardiomyopathy; computer-assisted image analysis; deep learning; inter-rater agreement; intra-rater agreement; kappa; rat.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting InterestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Distribution of intra-rater reliability for the five veterinary pathologists using data from the 100 repeated slides. The points represent the level of agreement for each pathologist using accuracy (A) and Cohen’s kappa (B). In each panel, values are shown for strict agreement (Exact Grade in panel A, Unweighted Cohen’s kappa in panel B) and a tolerance for some margin of disagreement (within ± 1 grade in panel A, weighted Cohen’s kappa in panel B).

**Figure 2.**
Intra-rater reliability agreement for the five veterinary pathologists by quintiles of AIA scores using data from the 100 repeated slides. Values for each rater (A, B, C, D, E) are shown for percent agreement for each quintile of AIA predicted score, where the upper bound of each quintile interval is shown on the horizontal axis. Mean values are shown in red +.

**Figure 3.**
Distribution of pairwise inter-rater reliability measures. The points represent the level of agreement between each of the ten rater pairs using percent agreement (A) and Cohen’s kappa (B). In each panel, values are shown for strict agreement (Exact Grade in panel A, Unweighted Cohen’s kappa in panel B) and a tolerance for some margin of disagreement (within ±1 grade in panel A, weighted Cohen’s kappa in panel B). Boxplots are overlaid to show the distribution of the data.

**Figure 4.**
Distribution of percent agreement across all pairs of raters. Horizontal lines represent the percent agreement between each of the ten rater pairs by deciles of AIA score (A) and by median grade severity (B). Mean values are shown in red +. In panel A, the upper bound of each decile interval is shown on the horizontal axis. In panel B, the median grade across all 5 raters is shown on the horizontal axis, with grades 4 and 5 combined.

See this image and copyright information in PMC

References

1. Aeffner F, Wilson K, Martin NT, et al. The gold standard paradox in digital image analysis: manual versus automated scoring as ground truth. Arch Pathol Lab Med. 2017;141(9):1267–1275. doi:10.5858/arpa.2016-0386-RA - DOI - PubMed
1. Belluco S, Avallone G, Di Palma S, Rasotto R, Oevermann A. Inter- and intraobserver agreement of canine and feline nervous system tumors. Vet Pathol. 2019;56(3):342–349. doi:10.1177/0300985818824952 - DOI - PubMed
1. Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med. 2008;121(5):S2–S23. doi:10.1016/j.amjmed.2008.01.001 - DOI - PubMed
1. Chanut F, Kimbrough C, Hailey R, et al. Spontaneous cardiomyopathy in young Sprague-Dawley rats. Toxicol Pathol. 2013;41(8):1126–1136. doi:10.1177/0192623313478692 - DOI - PubMed
1. Cohen J. A coefficient of agreement for nomial scales. Educ Psychol Meas. 1960;20(1):37–46. doi:10.1177/001316446002000104 - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring

Affiliations

Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical