Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 23;16(1):6773.
doi: 10.1038/s41467-025-62060-x.

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Affiliations

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

Chen Shen et al. Nat Commun. .

Abstract

Forensic pathology plays a vital role in determining the cause and manner of death through macroscopic and microscopic post-mortem examinations. However, the field faces challenges such as variability in outcomes, labor-intensive processes, and a shortage of skilled professionals. This paper introduces SongCi, a visual-language model tailored for forensic pathology. Leveraging advanced prototypical cross-modal self-supervised contrastive learning, SongCi improves the accuracy, efficiency, and generalizability of forensic analyses. Pre-trained and validated on a large multi-center dataset comprising over 16 million high-resolution image patches, 2, 228 vision-language pairs from post-mortem whole slide images, gross key findings, and 471 unique diagnostic outcomes, SongCi demonstrates superior performance over existing multi-modal models and computational pathology foundation models in forensic tasks. It matches experienced forensic pathologists' capabilities, significantly outperforms less experienced practitioners, and offers robust multi-modal explainability.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The framework of SongCi and studied large-vocabulary, multi-center datasets.
a Overview of WSI data. The dataset spans a broad spectrum of samples from nine different human organs, each meticulously annotated. b Data structure and provenance. The dataset was compiled from three premier forensic cohorts. Diagnostic outcomes in forensic pathology were represented using a word cloud and a Venn diagram to illustrate the distribution and overlap of diagnoses. c Training process of patch-level prototype encoder. SongCi utilizes a self-supervised contrastive learning framework, augmented with prototype-based clustering strategies to enhance efficiency, forming the basis of the prototypical patch-level encoder. d Training process of cross-multimodality fusion and alignment layer. Within the SongCi framework, diverse data modalities, including gross key findings and WSIs, are integrated using an innovative gated-attention-boosted multimodal fusion block. Subsequently, the framework aligns the unified representation space with forensic pathological diagnoses through self-supervised contrastive learning, effectively establishing inter-modal correlations. e Inference process of SongCi. SongCi processes the gross anatomical key findings and WSIs to generate a potential diagnosis. Additionally, SongCi provides a diagnostic rationale by highlighting significant terms related to the gross key findings and identifying suspicious regions within the WSIs. For more detailed information, please refer to the “Methods” section.
Fig. 2
Fig. 2. Prototype representation space of SongCi & results of post-mortem image generation.
a The prototype representation space is visualized using a 2D UMAP, where 933 dots represent the prototypes, with each dot colored according to the proportion of tissue types it represents. Source data are provided as a Source Data file. b, c Prototype-conditioned patch-level generation results. Sub-figure b shows the results for inter-tissue-specific prototypes, including autolysis, inflammation, fibrosis, and hemorrhage. Sub-figure (c) displays intra-tissue-specific prototypes, such as myocardial hypertrophy, cerebral edema, muscular tissue, pneumorrhagia, and hepatic steatosis. d, e The conditional diffusion models are exhibited. Sub-figure (d) illustrates the prototype-based conditional diffusion model, and sub-figure (e) shows the instance-based model. f The results of instance-based patch-level generation are presented, featuring representative instances like renal tubules with hemorrhage, normal renal tubules, liver fat particles (undissolved), and splenic trabeculae.
Fig. 3
Fig. 3. Instance & prototype segmentation results.
a The original WSIs of four different tissues, including spleen, brain, myocardium, and liver tissues. b, c, d illustrate the segmentation outcomes utilizing a traditional clustering method (i.e., H2T), a GMM-based clustering method (i.e., PANTHER), and our prototype-based methods (i.e., SongCi), respectively. Specifically, the input image is partitioned into seven distinct masks, with each of them represented by a unique color based on the number of patches within: orange, yellow, pink, blue, white, green, and crimson. The top four prevalent mask types for each image are indicated at the bottom of the figure. Also, the key distinctions between the three segmentation approaches are highlighted with black borders.
Fig. 4
Fig. 4. Comparisons of SongCi with state-of-the-art, open-sourced multimodality fusion methods.
The evaluation benchmarks SongCi against six established models utilizing three key performance metrics: recall, precision, and intersection over union (IoU). Radar charts illustrate the algorithm’s efficacy across nine different organs, and the associated table below consolidates the average scores for these organs, with the highest values emphasized in bold. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Multi-modality attention visualization of SongCi.
The multi-modality attention visualization of SongCi offers interpretable analyses for forensic pathology diagnosis across a range of tissues and organs. a, b display liver tissues; c, f gastrointestinal tissues; d brain tissues; e pancreatic tissues; and g, h spleen tissues, with i highlighting adrenal tissues. The WSI regions corresponding to the prototypes of the top five findings, along with the top five vital descriptors in the gross key findings, are delineated in distinct colors.

Similar articles

References

    1. Fox, S. E., Akmatbekov, A., Harbert, J. L., Li, G. & Brown, J. Q. Vander Heide RS. Pulmonary and cardiac pathology in African American patients with COVID-19: an autopsy series from New Orleans. Lancet Resp. Med.8, 681–686 (2020). - PMC - PubMed
    1. Wichmann, D. et al. Autopsy findings and venous thromboembolism in patients with COVID-19. Ann. Intern. Med.173, 268 (2020). - PubMed
    1. Roberts, I. S. et al. Post-mortem imaging as an alternative to autopsy in the diagnosis of adult deaths: a validation study. Lancet379, 136–142 (2012). - PMC - PubMed
    1. Bryce, C. et al. Pathophysiology of SARS-CoV-2: the Mount Sinai COVID-19 autopsy experience. Mod. Pathol.34, 1456–1467 (2021). - PMC - PubMed
    1. Cole, S. A. Forensic science and wrongful convictions: from exposer to contributor to corrector. N. Eng. Law Rev.46, 711 (2011).

LinkOut - more resources