Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 8;18(11):e0292578.
doi: 10.1371/journal.pone.0292578. eCollection 2023.

Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults

Affiliations

Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults

Coen Hacking et al. PLoS One. .

Abstract

Objectives: In long-term care for older adults, large amounts of text are collected relating to the quality of care, such as transcribed interviews. Researchers currently analyze textual data manually to gain insights, which is a time-consuming process. Text mining could provide a solution, as this methodology can be used to analyze large amounts of text automatically. This study aims to compare text mining to manual coding with regard to sentiment analysis and thematic content analysis.

Methods: Data were collected from interviews with residents (n = 21), family members (n = 20), and care professionals (n = 20). Text mining models were developed and compared to the manual approach. The results of the manual and text mining approaches were evaluated based on three criteria: accuracy, consistency, and expert feedback. Accuracy assessed the similarity between the two approaches, while consistency determined whether each individual approach found the same themes in similar text segments. Expert feedback served as a representation of the perceived correctness of the text mining approach.

Results: An accuracy analysis revealed that more than 80% of the text segments were assigned the same themes and sentiment using both text mining and manual approaches. Interviews coded with text mining demonstrated higher consistency compared to those coded manually. Expert feedback identified certain limitations in both the text mining and manual approaches.

Conclusions and implications: While these analyses highlighted the current limitations of text mining, they also exposed certain inconsistencies in manual analysis. This information suggests that text mining has the potential to be an effective and efficient tool for analysing large volumes of textual data in the context of long-term care for older adults.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Confusion matrix comparing sentiment analysis results of the manual and text mining approach.
The matrix compares manual coding (rows) against text mining predictions (columns) for sentiment values of the text. Each cell within the matrix represents the percentage occurrence of a particular sentiment alignment (or misalignment) between the manual and text mining approaches. The y-axis of each matrix represents the sentiment as determined through manual analysis, while the x-axis indicates the text mining predictions. The diagonal cells (from top left to bottom right) illustrate the percentage of agreement between the two methods, whereas all off-diagonal cells indicate discrepancies. For instance, the cell at the intersection of the "Positive" row and the "Negative" column displays instances where text was manually coded as positive but was predicted as negative by text mining.
Fig 2
Fig 2. Comparison of results from the thematic content analysis.
A confusion matrix is shown for each of the main INDEXQUAL themes (Experienced quality of care, Experiences, Expectations and Context). The y-axis of each matrix represents the presence or absence of a theme as determined through manual analysis, while the x-axis indicates the text mining predictions. Cells on the diagonals capture instances of agreement between manual coding and text mining for each theme. Off-diagonal cells detail discrepancies, indicating false positives or false negatives. Percentages within cells show the proportion of occurrences for each scenario in relation to the total dataset.

References

    1. Pols J. Enacting appreciations: Beyond the patient perspective. Health Care Analysis. 2005;13: 203–221. doi: 10.1007/s10728-005-6448-6 - DOI - PubMed
    1. Sion K, Verbeek H, de Vries E, Zwakhalen S, Odekerken-Schröder G, Schols J, et al. The feasibility of connecting conversations: A narrative method to assess experienced quality of care in nursing homes from the resident’s perspective. International Journal of Environmental Research and Public Health. 2020;17: 5118. doi: 10.3390/ijerph17145118 - DOI - PMC - PubMed
    1. Sion KY, Haex R, Verbeek H, Zwakhalen SM, Odekerken-Schröder G, Schols JM, et al. Experienced quality of post-acute and long-term care from the care recipient’s perspective–a conceptual framework. Journal of the American Medical Directors Association. 2019;20: 1386–1390. doi: 10.1016/j.jamda.2019.03.028 - DOI - PubMed
    1. Delespierre T, Denormandie P, Bar-Hen A, Josseran L. Empirical advances with text mining of electronic health records. BMC medical informatics and Decision Making. 2017;17: 1–15. - PMC - PubMed
    1. Strauss A, Corbin J. Basics of qualitative research techniques. Sage publications; Thousand Oaks, CA; 1998.