Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study

doi:10.2196/75103

. 2025 Jun 20:13:e75103.

doi: 10.2196/75103.

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study

Chunliang Chen¹, Xinyu Wang¹, Ming Guan¹, Wenjing Yue^{1

2}, Yuanbin Wu¹, Ya Zhou³, Xiaoling Wang¹

Affiliations

¹ East China Normal University, No.3663 Zhongshanbei Road, Shanghai, China, 86 18621306726.
² Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China.
³ Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China.

PMID: 40540614
PMCID: PMC12204376
DOI: 10.2196/75103

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study

Chunliang Chen et al. JMIR Med Inform. 2025.

. 2025 Jun 20:13:e75103.

doi: 10.2196/75103.

Authors

Chunliang Chen¹, Xinyu Wang¹, Ming Guan¹, Wenjing Yue^{1

2}, Yuanbin Wu¹, Ya Zhou³, Xiaoling Wang¹

Affiliations

¹ East China Normal University, No.3663 Zhongshanbei Road, Shanghai, China, 86 18621306726.
² Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China.
³ Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China.

PMID: 40540614
PMCID: PMC12204376
DOI: 10.2196/75103

Abstract

Background: A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.

Objective: This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.

Methods: We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam," customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.

Results: We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.

Conclusions: Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.

Keywords: RAG; TCM LLMs; instruction tuning; large language model; syndrome differentiation thinking; traditional Chinese medicine.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1.. A framework for generating instruction data is formed by integrating the local knowledge base into the Syndrome Differentiation Thinking Template.**

Figure 2.. Performance comparison of RAG, CoT, and Base models across Two Tasks; Task 1 pathogenesis inference (left), Task 2 symptom inference (right). CoT: Chain-of-Thought; RAG: Retrieval Augmented Generation.

**Figure 3.. Comparison of results on Task 3: CoT, RAG and our method. CoT: Chain-of-Thought; RAG: Retrieval Augmented Generation.**

See this image and copyright information in PMC

Cited by

Artificial Intelligence in Traditional Chinese Medicine: Multimodal Fusion and Machine Learning for Enhanced Diagnosis and Treatment Efficacy.
Wang J, Liu YM, Li J, He HQ, Liu C, Song YJ, Ma SY. Wang J, et al. Curr Med Sci. 2025 Aug 7. doi: 10.1007/s11596-025-00103-6. Online ahead of print. Curr Med Sci. 2025. PMID: 40773005 Review.
Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis.
Zhao W, Lai H, Pan B, Huang J, Xia D, Bai C, Liu J, Liu J, Jin Y, Shang H, Liu J, Shi N, Liu J, Chen Y, Estill J, Ge L. Zhao W, et al. Front Pharmacol. 2025 Jul 25;16:1649041. doi: 10.3389/fphar.2025.1649041. eCollection 2025. Front Pharmacol. 2025. PMID: 40786055 Free PMC article.

References

1. Ge J, Li M, Delk MB, Lai JC. A comparison of a large language model vs manual chart review for the extraction of data elements from the electronic health record. Gastroenterology. 2024 Apr;166(4):707–709. doi: 10.1053/j.gastro.2023.12.019. doi. Medline. - DOI - PMC - PubMed
1. Pal S, Bhattacharya M, Islam MA, Chakraborty C. ChatGPT or LLM in next-generation drug discovery and development: pharmaceutical and biotechnology companies can make use of the artificial intelligence-based device for a faster way of drug discovery and development. Int J Surg. 2023 Dec 1;109(12):4382–4384. doi: 10.1097/JS9.0000000000000719. doi. Medline. - DOI - PMC - PubMed
1. Mu Y, He D. The potential applications and challenges of ChatGPT in the medical field. Int J Gen Med. 2024;17:817–826. doi: 10.2147/IJGM.S456659. doi. Medline. - DOI - PMC - PubMed
1. Chang Y, Yin JM, Li JM, Liu C, Cao LY, Lin SY. Applications and future prospects of medical LLMs: a survey based on the M-KAT conceptual framework. J Med Syst. 2024 Dec 27;48(1):112. doi: 10.1007/s10916-024-02132-5. doi. Medline. - DOI - PubMed
1. Zhang S, Wang W, Pi X, He Z, Liu H. Advances in the application of traditional Chinese medicine using artificial intelligence: a review. Am J Chin Med. 2023;51(5):1067–1083. doi: 10.1142/S0192415X23500490. doi. Medline. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central

[1] Ge J, Li M, Delk MB, Lai JC. A comparison of a large language model vs manual chart review for the extraction of data elements from the electronic health record. Gastroenterology. 2024 Apr;166(4):707–709. doi: 10.1053/j.gastro.2023.12.019. doi. Medline. - DOI - PMC - PubMed

[2] Ge J, Li M, Delk MB, Lai JC. A comparison of a large language model vs manual chart review for the extraction of data elements from the electronic health record. Gastroenterology. 2024 Apr;166(4):707–709. doi: 10.1053/j.gastro.2023.12.019. doi. Medline. - DOI - PMC - PubMed

[3] Pal S, Bhattacharya M, Islam MA, Chakraborty C. ChatGPT or LLM in next-generation drug discovery and development: pharmaceutical and biotechnology companies can make use of the artificial intelligence-based device for a faster way of drug discovery and development. Int J Surg. 2023 Dec 1;109(12):4382–4384. doi: 10.1097/JS9.0000000000000719. doi. Medline. - DOI - PMC - PubMed

[4] Pal S, Bhattacharya M, Islam MA, Chakraborty C. ChatGPT or LLM in next-generation drug discovery and development: pharmaceutical and biotechnology companies can make use of the artificial intelligence-based device for a faster way of drug discovery and development. Int J Surg. 2023 Dec 1;109(12):4382–4384. doi: 10.1097/JS9.0000000000000719. doi. Medline. - DOI - PMC - PubMed

[5] Mu Y, He D. The potential applications and challenges of ChatGPT in the medical field. Int J Gen Med. 2024;17:817–826. doi: 10.2147/IJGM.S456659. doi. Medline. - DOI - PMC - PubMed

[6] Mu Y, He D. The potential applications and challenges of ChatGPT in the medical field. Int J Gen Med. 2024;17:817–826. doi: 10.2147/IJGM.S456659. doi. Medline. - DOI - PMC - PubMed

[7] Chang Y, Yin JM, Li JM, Liu C, Cao LY, Lin SY. Applications and future prospects of medical LLMs: a survey based on the M-KAT conceptual framework. J Med Syst. 2024 Dec 27;48(1):112. doi: 10.1007/s10916-024-02132-5. doi. Medline. - DOI - PubMed

[8] Chang Y, Yin JM, Li JM, Liu C, Cao LY, Lin SY. Applications and future prospects of medical LLMs: a survey based on the M-KAT conceptual framework. J Med Syst. 2024 Dec 27;48(1):112. doi: 10.1007/s10916-024-02132-5. doi. Medline. - DOI - PubMed

[9] Zhang S, Wang W, Pi X, He Z, Liu H. Advances in the application of traditional Chinese medicine using artificial intelligence: a review. Am J Chin Med. 2023;51(5):1067–1083. doi: 10.1142/S0192415X23500490. doi. Medline. - DOI - PubMed

[10] Zhang S, Wang W, Pi X, He Z, Liu H. Advances in the application of traditional Chinese medicine using artificial intelligence: a review. Am J Chin Med. 2023;51(5):1067–1083. doi: 10.1142/S0192415X23500490. doi. Medline. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study

Affiliations

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources