Review

. 2023 Jul 22;23(1):132.

doi: 10.1186/s12911-023-02216-1.

Quality indices for topic model selection and evaluation: a literature review and case study

Christopher Meaney¹, Therese A Stukel², Peter C Austin², Rahim Moineddin³, Michelle Greiver³, Michael Escobar⁴

Affiliations

¹ Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada. christopher.meaney@utoronto.ca.
² Institute of Health Policy, Management and Evaluation, ICES, University of Toronto, Toronto, Canada.
³ Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada.
⁴ Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.

PMID: 37481523
PMCID: PMC10362613
DOI: 10.1186/s12911-023-02216-1

Review

Quality indices for topic model selection and evaluation: a literature review and case study

Christopher Meaney et al. BMC Med Inform Decis Mak. 2023.

. 2023 Jul 22;23(1):132.

doi: 10.1186/s12911-023-02216-1.

Authors

Christopher Meaney¹, Therese A Stukel², Peter C Austin², Rahim Moineddin³, Michelle Greiver³, Michael Escobar⁴

Affiliations

¹ Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada. christopher.meaney@utoronto.ca.
² Institute of Health Policy, Management and Evaluation, ICES, University of Toronto, Toronto, Canada.
³ Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada.
⁴ Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.

PMID: 37481523
PMCID: PMC10362613
DOI: 10.1186/s12911-023-02216-1

Abstract

Background: Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus.

Design, setting and data: Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01/01/2017 through 12/31/2020 from primary care electronic medical records in Toronto Canada.

Methods: Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall's weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and/or bootstrap stability analysis were used to estimate these metrics on our corpus.

Results: Cross-validated reconstruction error favored large topic models (K ≥ 100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K = 100 topics). Rank biased overlap and Kendall's weighted tau favored small models (K = 5 topics). Few model evaluation metrics suggested mid-sized topic models (25 ≤ K ≤ 75) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus.

Conclusions: Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.

Keywords: Clinical text data; Cross-validation; Electronic medical record; Internal validation; Non-negative matrix factorization; Stability analysis; Topic model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Average training/testing reconstruction error of NMF models of complexity K = (5,10,25,50,100,150,200,250) estimated using five-fold Monte Carlo cross-validation

**Fig. 2**
Average topic coherence of NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis. The left-hand panel plot uses the UCI topic coherence score, and the right-hand panel plot uses the UMASS topic coherence score

**Fig. 3**
Average rank-biased overlap of $ϕ$ over NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis

**Fig. 4**
Average rank-biased overlap of $θ$ over NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis

**Fig. 5**
Average Kendall’s weighted tau of $ϕ$ over NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis

**Fig. 6**
Average Kendall’s weighted tau of $θ$ over NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis

**Fig. 7**
Average partition coefficient and partition entropy scores of NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis. The left-hand panel plot uses the partition coefficient score, and the right-hand panel plot uses partition entropy score. Both the partition coefficient and the partition entropy suggest that smaller models result in more “crisp” clustering solutions; whereas, larger models result in more “fuzzy/admixed” clustering solutions

**Fig. 8**
Average Xie-Beni scores of NMF models of complexity K = (5,10,25,50,75,100) estimated using five-fold stability analysis

See this image and copyright information in PMC

References

1. Gentzkow M, Kelly B, Taddy M. Text as Data. Journal of Economic Literature. 2019;57:535–574. doi: 10.1257/jel.20181020. - DOI
1. Deerwester S, Dumais S, Furnas G, et al. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. 1990;41:391–408. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9. - DOI
1. Berry M, Dumais S, O’Brien G. Using Linear Algebra for Intelligent Information Retrieval. SIAM Rev. 1995;37:573–595. doi: 10.1137/1037127. - DOI
1. Landauer T, Dumais S. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Reviews. 1997;104:211–240. doi: 10.1037/0033-295X.104.2.211. - DOI
1. Lee D, Seung S. Learning the Parts of an Object by Non-Negative Matrix Factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

FDN 143303/CIHR/Canada

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quality indices for topic model selection and evaluation: a literature review and case study

Affiliations

Quality indices for topic model selection and evaluation: a literature review and case study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources