Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Walter S Mathis¹, Sophia Zhao², Nicholas Pratt², Jeremy Weleff², Stefano De Paoli³

Affiliations

¹ Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA. Electronic address: Walter.Mathis@Yale.edu.
² Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA.
³ Division of Sociology, School of Business, Law and Social Sciences, Abertay University, Dundee, Scotland, United Kingdom.

PMID: 39067136
DOI: 10.1016/j.cmpb.2024.108356

Comparative Study

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Walter S Mathis et al. Comput Methods Programs Biomed. 2024 Oct.

. 2024 Oct:255:108356.

doi: 10.1016/j.cmpb.2024.108356. Epub 2024 Jul 24.

Authors

Walter S Mathis¹, Sophia Zhao², Nicholas Pratt², Jeremy Weleff², Stefano De Paoli³

Affiliations

¹ Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA. Electronic address: Walter.Mathis@Yale.edu.
² Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA.
³ Division of Sociology, School of Business, Law and Social Sciences, Abertay University, Dundee, Scotland, United Kingdom.

PMID: 39067136
DOI: 10.1016/j.cmpb.2024.108356

Abstract

Background: Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information.

Objective: Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting.

Methods: Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans.

Results: These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44-0.69), which are promising preliminary results.

Conclusion: Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.

Keywords: Artificial intelligence; Large language models; Mental health; Qualitative methods; Thematic analysis.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Affiliations

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous