A Framework for Considering Comprehensibility in Modeling

Michael Gleicher¹

Affiliations

PMID: 27441712
PMCID: PMC4932655
DOI: 10.1089/big.2016.0007

A Framework for Considering Comprehensibility in Modeling

Michael Gleicher. Big Data. 2016 Jun.

. 2016 Jun;4(2):75-88.

doi: 10.1089/big.2016.0007. Epub 2016 Jun 7.

Author

Michael Gleicher¹

Affiliation

¹ Department of Computer Sciences, University of Wisconsin-Madison , Madison, Wisconsin.

PMID: 27441712
PMCID: PMC4932655
DOI: 10.1089/big.2016.0007

Abstract

Comprehensibility in modeling is the ability of stakeholders to understand relevant aspects of the modeling process. In this article, we provide a framework to help guide exploration of the space of comprehensibility challenges. We consider facets organized around key questions: Who is comprehending? Why are they trying to comprehend? Where in the process are they trying to comprehend? How can we help them comprehend? How do we measure their comprehension? With each facet we consider the broad range of options. We discuss why taking a broad view of comprehensibility in modeling is useful in identifying challenges and opportunities for solutions.

Keywords: data analysis; human-computer interaction; machine learning; statistical modeling; visual analytics; visualization.

PubMed Disclaimer

Figures

<b>FIG. 1.</b> — **FIG. 1.**
A summary of the framework proposed in this article. The specific lists for each question are initial organizations to show the broad range of aspects to consider.

<b>FIG. 2.</b> — **FIG. 2.**
Visualization of a validation experiment for a DNA-binding surface classifier that allows exploration of classification results. The corpus overview (left) is configured to display each molecule in the test set as a quilted glyph and orders these glyphs by classifier performance to show how performance varies over the molecules. Those proteins that appear more green have more true positive classifications, whereas those molecular that appear more red or blue have more misclassifications (false negatives and false positives, respectively). Selected molecules (left, yellow box) are visualized as heatmaps in a subset view (middle) and ordered by molecule size to help localize the positions of errors relative to correct answers. The detailed view (right) shows a selected molecule to confirm that most errors (blue, red) are close to the correctly found binding site (green).

<b>FIG. 3.</b> — **FIG. 3.**
Visualization of example *Explainers*, classifiers constructed with tradeoffs that emphasize comprehensibility concerns. In this example, Shakepeare's 36 plays are measured with a set of 115 “Docuscope” features. Classifiers are constructed to identify the 12 comedies (green). Each column represents a linear SVM classifier, with the plays sorted according to their score. The leftmost classifier uses only two features with unit coefficients. It makes several mistakes (e.g., misclassifying the tragedies *Othello* and *Romeo and Juliet* as comedies), but the simplicity of the classifier makes it useful for building theory about how Shakespeare used the linguistic constructs in the different genres. In contrast, other classifiers may use more features and more complex weights to achieve better accuracy (and larger SVM margins), at the expense of how easy the functions are to comprehend. SVM, support vector machine.

See this image and copyright information in PMC

Cited by

Teaching Responsible Data Science: Charting New Pedagogical Territory.
Lewis A, Stoyanovich J. Lewis A, et al. Int J Artif Intell Educ. 2022;32(3):783-807. doi: 10.1007/s40593-021-00241-7. Epub 2021 Apr 15. Int J Artif Intell Educ. 2022. PMID: 33880114 Free PMC article.
Interpretability of Machine Learning Solutions in Public Healthcare: The CRISP-ML Approach.
Kolyshkina I, Simoff S. Kolyshkina I, et al. Front Big Data. 2021 May 26;4:660206. doi: 10.3389/fdata.2021.660206. eCollection 2021. Front Big Data. 2021. PMID: 34124652 Free PMC article.

References

1. Schulz H-J, Nocke T, Heitzler M, Schumann H. A design space of visualization tasks. IEEE Trans Vis Comput Graphics. 2013;19:2366–2375 - PubMed
1. Huysmans J, Baesens B, Vanthienen J. Using rule extraction to improve the comprehensibility of predictive models. SSRN 2006. Available at: http://dx.doi.org/10.2139/ssrn.961358
1. Stiglic G, Povalej Brzan P, Fijacko N, Wang F, Delibasic B, Kalousis A, Obradovic Z. Comprehensible predictive modeling using regularized logistic regression and comorbidity based features. PLoS One. 2015;10:e014443–9. - PMC - PubMed
1. Zeiler M, Fergus R. Visualizing and understanding convolutional networks. In Fleet D, Pajdla T, Schiele B, Tuytelaars T. (Eds.): ECCV 2014, Volume 8689 of Lecture Notes in Computer Science, Cham: Springer International Publishing, 2014. pp. 818–833
1. Munzner T. Visualization Analysis and Design. Boca Raton, FL, CRC Press, 2014

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 AI077376/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Framework for Considering Comprehensibility in Modeling

Affiliation

A Framework for Considering Comprehensibility in Modeling

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources