About Interpreting and Explaining Machine Learning and Statistical Models - Archive ouverte HAL Access content directly
Conference Papers Year : 2020

About Interpreting and Explaining Machine Learning and Statistical Models

(1)
1

Abstract

The use of black-box models for decisions affecting citizens is a hot topic of debate. Some authors like Rudin [5] are in favour of stopping the use of machine learning models and going back to models which are interpretable by design. We will focus in this communication on the statistical aspects, leaving aside ethics, despite its importance. The dilemma between explaining and predicting has been addressed by Breiman [1], Saporta [6] and Shmueli [5] among others. First of all, it seems to us necessary to distinguish between explicability: how does the model work, is the algorithm auditable? and interpretability: what are the important variables and values that may change the decision? A first approach to make black-boxes interpretable is by performing some post-processing. The idea is to plug in an interpretable model to its outputs, eg try to get close predictions by a decision tree or a linear model. One example is given in Liberati et al. [3] where a non-linear SVM is approximately reconstructed by a linear classifier, with a small loss of efficiency. A second approach is to derive variable importance measures by some kind of sensitivity analysis: following Breiman, this can be done by measuring the decrease in prediction accuracy when the values of a predictor are randomly permuted. Molnar [4] gives many examples of such "model-agnostic" interpretation methods. The concept of an interpretable model deserves to be discussed. In addition to logic or rule based models like decision trees, it is often considered that linear models are easily interpretable. Grömping [2] and Wallard [8] has shown that it is not the case: there exist more than 10 metrics for measuring variable importance in linear regression.
Vignette du fichier
SaportaSMTDA2020.pdf (2.19 Mo) Télécharger le fichier
Vignette du fichier
SMTDA2020-Saporta.pdf (62.89 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-02779513 , version 1 (04-06-2020)

Identifiers

  • HAL Id : hal-02779513 , version 1

Cite

Gilbert Saporta. About Interpreting and Explaining Machine Learning and Statistical Models. SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Jun 2020, Barcelone (virtual), Spain. ⟨hal-02779513⟩
128 View
206 Download

Share

Gmail Facebook Twitter LinkedIn More