Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Chen Shen^#¹, Chunfeng Lian^#^{2

3}, Wanqing Zhang¹, Fan Wang⁴, Jianhua Zhang⁵, Shuanliang Fan¹, Xin Wei¹, Gongji Wang¹, Kehan Li⁴, Hongshu Mu⁶, Hao Wu¹, Xinggong Liang¹, Jianhua Ma^{7

8}, Zhenyuan Wang⁹

Affiliations

¹ Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
² School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, China. chunfeng.lian@xjtu.edu.cn.
³ Pazhou Lab (Huangpu), Guangzhou, China. chunfeng.lian@xjtu.edu.cn.
⁴ Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
⁵ Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Shanghai, China.
⁶ Weicheng Branch, Xian'yang Public Security Bureau, Xian'yang, Shaanxi, China.
⁷ Pazhou Lab (Huangpu), Guangzhou, China. jhma@xjtu.edu.cn.
⁸ Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China. jhma@xjtu.edu.cn.
⁹ Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China. wzy218@xjtu.edu.cn.

^# Contributed equally.

PMID: 40702007
PMCID: PMC12287307
DOI: 10.1038/s41467-025-62060-x

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Chen Shen et al. Nat Commun. 2025.

. 2025 Jul 23;16(1):6773.

doi: 10.1038/s41467-025-62060-x.

Authors

Affiliations

¹ Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
² School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, China. chunfeng.lian@xjtu.edu.cn.
³ Pazhou Lab (Huangpu), Guangzhou, China. chunfeng.lian@xjtu.edu.cn.
⁴ Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
⁵ Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Shanghai, China.
⁶ Weicheng Branch, Xian'yang Public Security Bureau, Xian'yang, Shaanxi, China.
⁷ Pazhou Lab (Huangpu), Guangzhou, China. jhma@xjtu.edu.cn.
⁸ Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China. jhma@xjtu.edu.cn.
⁹ Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China. wzy218@xjtu.edu.cn.

^# Contributed equally.

PMID: 40702007
PMCID: PMC12287307
DOI: 10.1038/s41467-025-62060-x

Abstract

Forensic pathology plays a vital role in determining the cause and manner of death through macroscopic and microscopic post-mortem examinations. However, the field faces challenges such as variability in outcomes, labor-intensive processes, and a shortage of skilled professionals. This paper introduces SongCi, a visual-language model tailored for forensic pathology. Leveraging advanced prototypical cross-modal self-supervised contrastive learning, SongCi improves the accuracy, efficiency, and generalizability of forensic analyses. Pre-trained and validated on a large multi-center dataset comprising over 16 million high-resolution image patches, 2, 228 vision-language pairs from post-mortem whole slide images, gross key findings, and 471 unique diagnostic outcomes, SongCi demonstrates superior performance over existing multi-modal models and computational pathology foundation models in forensic tasks. It matches experienced forensic pathologists' capabilities, significantly outperforms less experienced practitioners, and offers robust multi-modal explainability.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. The framework of SongCi and studied large-vocabulary, multi-center datasets.**
a Overview of WSI data. The dataset spans a broad spectrum of samples from nine different human organs, each meticulously annotated. b Data structure and provenance. The dataset was compiled from three premier forensic cohorts. Diagnostic outcomes in forensic pathology were represented using a word cloud and a Venn diagram to illustrate the distribution and overlap of diagnoses. c Training process of patch-level prototype encoder. SongCi utilizes a self-supervised contrastive learning framework, augmented with prototype-based clustering strategies to enhance efficiency, forming the basis of the prototypical patch-level encoder. d Training process of cross-multimodality fusion and alignment layer. Within the SongCi framework, diverse data modalities, including gross key findings and WSIs, are integrated using an innovative gated-attention-boosted multimodal fusion block. Subsequently, the framework aligns the unified representation space with forensic pathological diagnoses through self-supervised contrastive learning, effectively establishing inter-modal correlations. e Inference process of SongCi. SongCi processes the gross anatomical key findings and WSIs to generate a potential diagnosis. Additionally, SongCi provides a diagnostic rationale by highlighting significant terms related to the gross key findings and identifying suspicious regions within the WSIs. For more detailed information, please refer to the “Methods” section.

**Fig. 2. Prototype representation space of SongCi & results of post-mortem image generation.**
a The prototype representation space is visualized using a 2D UMAP, where 933 dots represent the prototypes, with each dot colored according to the proportion of tissue types it represents. Source data are provided as a Source Data file. b, c Prototype-conditioned patch-level generation results. Sub-figure b shows the results for inter-tissue-specific prototypes, including autolysis, inflammation, fibrosis, and hemorrhage. Sub-figure (c) displays intra-tissue-specific prototypes, such as myocardial hypertrophy, cerebral edema, muscular tissue, pneumorrhagia, and hepatic steatosis. d, e The conditional diffusion models are exhibited. Sub-figure (d) illustrates the prototype-based conditional diffusion model, and sub-figure (e) shows the instance-based model. f The results of instance-based patch-level generation are presented, featuring representative instances like renal tubules with hemorrhage, normal renal tubules, liver fat particles (undissolved), and splenic trabeculae.

**Fig. 3. Instance & prototype segmentation results.**
a The original WSIs of four different tissues, including spleen, brain, myocardium, and liver tissues. b, c, d illustrate the segmentation outcomes utilizing a traditional clustering method (i.e., H2T), a GMM-based clustering method (i.e., PANTHER), and our prototype-based methods (i.e., SongCi), respectively. Specifically, the input image is partitioned into seven distinct masks, with each of them represented by a unique color based on the number of patches within: orange, yellow, pink, blue, white, green, and crimson. The top four prevalent mask types for each image are indicated at the bottom of the figure. Also, the key distinctions between the three segmentation approaches are highlighted with black borders.

**Fig. 4. Comparisons of SongCi with state-of-the-art, open-sourced multimodality fusion methods.**
The evaluation benchmarks SongCi against six established models utilizing three key performance metrics: recall, precision, and intersection over union (IoU). Radar charts illustrate the algorithm’s efficacy across nine different organs, and the associated table below consolidates the average scores for these organs, with the highest values emphasized in bold. Source data are provided as a Source Data file.

**Fig. 5. Multi-modality attention visualization of SongCi.**
The multi-modality attention visualization of SongCi offers interpretable analyses for forensic pathology diagnosis across a range of tissues and organs. a, b display liver tissues; c, f gastrointestinal tissues; d brain tissues; e pancreatic tissues; and g, h spleen tissues, with i highlighting adrenal tissues. The WSI regions corresponding to the prototypes of the top five findings, along with the top five vital descriptors in the gross key findings, are delineated in distinct colors.

See this image and copyright information in PMC

References

1. Fox, S. E., Akmatbekov, A., Harbert, J. L., Li, G. & Brown, J. Q. Vander Heide RS. Pulmonary and cardiac pathology in African American patients with COVID-19: an autopsy series from New Orleans. Lancet Resp. Med.8, 681–686 (2020). - PMC - PubMed
1. Wichmann, D. et al. Autopsy findings and venous thromboembolism in patients with COVID-19. Ann. Intern. Med.173, 268 (2020). - PubMed
1. Roberts, I. S. et al. Post-mortem imaging as an alternative to autopsy in the diagnosis of adult deaths: a validation study. Lancet379, 136–142 (2012). - PMC - PubMed
1. Bryce, C. et al. Pathophysiology of SARS-CoV-2: the Mount Sinai COVID-19 autopsy experience. Mod. Pathol.34, 1456–1467 (2021). - PMC - PubMed
1. Cole, S. A. Forensic science and wrongful convictions: from exposer to contributor to corrector. N. Eng. Law Rev.46, 711 (2011).

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Affiliations

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources