Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

Prottay Kumar Adhikary¹, Isha Motiyani¹, Gayatri Oke¹, Maithili Joshi¹, Kanupriya Pathak¹, Salam Michael Singh¹, Tanmoy Chakraborty^{1

2}

Affiliations

¹ Department of Electrical Engineering, Indian Institute of Technology Delhi, Room: 3B-7 (Block III 3rd Floor), Hauz Khas, New Delhi, 110016, India, 91 26591076 ext 011.
² Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India.

PMID: 40669074
PMCID: PMC12286563
DOI: 10.2196/71977

Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

Prottay Kumar Adhikary et al. J Med Internet Res. 2025.

. 2025 Jul 16:27:e71977.

doi: 10.2196/71977.

Authors

Prottay Kumar Adhikary¹, Isha Motiyani¹, Gayatri Oke¹, Maithili Joshi¹, Kanupriya Pathak¹, Salam Michael Singh¹, Tanmoy Chakraborty^{1

2}

Affiliations

¹ Department of Electrical Engineering, Indian Institute of Technology Delhi, Room: 3B-7 (Block III 3rd Floor), Hauz Khas, New Delhi, 110016, India, 91 26591076 ext 011.
² Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India.

PMID: 40669074
PMCID: PMC12286563
DOI: 10.2196/71977

Abstract

Background: The quality and accessibility of menstrual health education (MHE) in low- and middle-income countries, including India, remain inadequate due to persistent challenges (eg, poverty, social stigma, and gender inequality). While community-driven initiatives have sought to raise awareness, artificial intelligence offers a scalable and efficient solution for disseminating accurate information. However, existing general-purpose large language models (LLMs) are often ill-suited for this task, tending to exhibit low accuracy, cultural insensitivity, and overly complex responses. To address these limitations, we developed MenstLLaMA-a specialized LLM tailored to the Indian context and designed to deliver MHE empathetically, supportively, and accessibly.

Objective: We aimed to develop and evaluate MenstLLaMA-a specialized LLM tailored to deliver accurate, culturally sensitive MHE-and assess its effectiveness in comparison to existing general-purpose models.

Methods: We curated MENST-a novel, domain-specific dataset comprising 23,820 question-answer pairs aggregated from medical websites, government portals, and health education resources. This dataset was systematically annotated with metadata capturing age groups, regions, topics, and sociocultural contexts. MenstLLaMA was developed by fine-tuning Meta-LLaMA-3-8B-Instruct, using parameter-efficient fine-tuning with low-rank adaptation to achieve domain alignment while minimizing computational overhead. We benchmarked MenstLLaMA against 9 state-of-the-art general-purpose LLMs, including GPT-4o, Claude-3, Gemini 1.5 Pro, and Mistral. The evaluation followed a multilayered framework: (1) automatic evaluation using standard natural language processing metrics (BLEU [Bilingual Evaluation Understudy], METEOR [Metric for Evaluation of Translation with Explicit Ordering], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence], and BERTScore [Bidirectional Encoder Representations from Transformers Score]); (2) evaluation by clinical experts (N=18), who rated 200 expert-curated queries for accuracy and appropriateness; (3) medical practitioner interaction through the ISHA (Intelligent System for Menstrual Health Assistance) interactive chatbot, assessing qualitative dimensions (eg, relevance, understandability, preciseness, correctness, and context sensitivity); and (4) a user study with volunteer participants (N=200), who evaluated MenstLLaMA in 15- to 20-minute randomized sessions, rating the system across 7 qualitative user satisfaction metrics.

Results: MenstLLaMA achieved the highest scores in BLEU (0.059) and BERTScore (0.911), outperforming GPT-4o (BLEU: 0.052, BERTScore: 0.896) and Claude-3 (BERTScore: 0.888). Clinical experts preferred MenstLLaMA's responses over gold-standard answers in several culturally sensitive cases. In medical practitioners' evaluations using the ISHA-the chat interface powered by MenstLLaMA-the model scored 3.5 in relevance, 3.6 in understandability, 3.1/5 in preciseness, 3.5/5 in correctness, and 4.0/5 in context sensitivity. User evaluations indicated even stronger results, with ratings of 4.7/5 for understandability, 4.3/5 for relevance, 4.28/5 for preciseness, 4.1/5 for correctness, 4.6/5 for tone, 4.2/5 for flow, and 3.9/5 for context sensitivity.

Conclusions: MenstLLaMA demonstrates exceptional accuracy, empathy, and user satisfaction within the domain of MHE, bridging critical gaps left by general-purpose LLMs. Its potential for integration into broader health education platforms positions it as a transformative tool for menstrual well-being. Future research could explore its long-term impact on public perception and menstrual hygiene practices, while expanding demographic representation, enhancing context sensitivity, and integrating multimodal and voice-based interactions to improve accessibility across diverse user groups.

Keywords: artificial intelligence; cultural sensitivity; digital health; health equity; large language model; menstrual health education.

© Prottay Kumar Adhikary, Isha Motiyani, Gayatri Oke, Maithili Joshi, Kanupriya Pathak, Salam Michael Singh, Tanmoy Chakraborty. Originally published in the Journal of Medical Internet Research (https://www.jmir.org).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1.. Taxonomy of menstrual health topics in the MENST dataset. The taxonomy consists of 7 primary categories (anatomy, normal menstruation, abnormal menstruation, pregnancy, lifestyle, support, and society) with corresponding subtopics as described in the *Metadata Creation* section.

Figure 2.. Prompt template structure used for curating question-answer pairs from unstructured medical documents. The template includes 4 components: (1) a task description specifying the objective of generating relevant questions from the document content; (2) the unstructured document text from which the question-answer (QA) pair is to be generated; (3) instructions for formulating concise, well-structured questions focusing on key information; and (4) example QA pairs (n=3) from gold-standard data to guide the generation process.

Figure 3.. Example of the prompt template used for dataset augmentation through question paraphrasing. The template illustrates the male-perspective paraphrasing strategy, comprising (1) an input instruction specifying the task of rephrasing a question from a male viewpoint; (2) a sample input question related to menstrual pain; and (3) 3 generated paraphrased outputs reflecting different female familial roles (wife, teenage daughter, and growing woman). This strategy enabled the generation of diverse yet contextually relevant question variants, enhancing MenstLLaMA’s ability to understand and respond to queries from varied social and cultural perspectives. LLaMA: Large Language Model Meta AI.

Figure 4.. Instruction fine-tuning format used for MenstLLaMA, illustrating the standardized structure for question-answer (QA) pairs. The format shows the conversion of QA pairs into LLaMA instruction syntax ( [INST] question [/INST] answer). This example features a dietary question related to menstruation, with a response providing clear, evidence-based nutritional guidance. This format helps the model effectively distinguish between questions and their corresponding answers. LLaMA: Large Language Model Meta AI.

~~Figure 5.. Interface of the ISHA (Intelligent System for Menstrual Health Assistance) chatbot powered by MenstLLaMA. LLaMA: Large Language Model Meta AI.~~

~~See this image and copyright information in PMC~~

References

Mohd Tohit NF, Haque M. Forbidden conversations: a comprehensive exploration of taboos in sexual and reproductive health. Cureus. 2024 Aug;16(8):e66723. doi: 10.7759/cureus.66723. doi. Medline. - DOI - PMC - PubMed

Tuli A, Dalvi S, Kumar N, Singh P. "It’s a girl thing": examining challenges and opportunities around menstrual health education in India. ACM Trans Comput-Hum Interact. 2019 Jul 25;26(5):1–24. doi: 10.1145/3325282. doi. - DOI

Bhartiya A. Menstruation, religion and society. IJSSH. 2013:523–527. doi: 10.7763/IJSSH.2013.V3.296. doi. - DOI

Tan DA, Haththotuwa R, Fraser IS. Cultural aspects and mythologies surrounding menstruation and abnormal uterine bleeding. Best Pract Res Clin Obstet Gynaecol. 2017 Apr;40:121–133. doi: 10.1016/j.bpobgyn.2016.09.015. doi. Medline. - DOI - PubMed

van Eijk AM, Sivakami M, Thakkar MB, et al. Menstrual hygiene management among adolescent girls in India: a systematic review and meta-analysis. BMJ Open. 2016 Mar 2;6(3):e010290. doi: 10.1136/bmjopen-2015-010290. doi. Medline. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources

Full Text Sources
JMIR Publications
PubMed Central
Medical
MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

Affiliations

Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical