Skip to Main content Skip to Navigation
Conference papers

About Interpreting and Explaining Machine Learning and Statistical Models

Gilbert Saporta 1
1 CEDRIC - MSDMA - CEDRIC. Méthodes statistiques de data-mining et apprentissage
CEDRIC - Centre d'études et de recherche en informatique et communications
Abstract : The use of black-box models for decisions affecting citizens is a hot topic of debate. Some authors like Rudin [5] are in favour of stopping the use of machine learning models and going back to models which are interpretable by design. We will focus in this communication on the statistical aspects, leaving aside ethics, despite its importance. The dilemma between explaining and predicting has been addressed by Breiman [1], Saporta [6] and Shmueli [5] among others. First of all, it seems to us necessary to distinguish between explicability: how does the model work, is the algorithm auditable? and interpretability: what are the important variables and values that may change the decision? A first approach to make black-boxes interpretable is by performing some post-processing. The idea is to plug in an interpretable model to its outputs, eg try to get close predictions by a decision tree or a linear model. One example is given in Liberati et al. [3] where a non-linear SVM is approximately reconstructed by a linear classifier, with a small loss of efficiency. A second approach is to derive variable importance measures by some kind of sensitivity analysis: following Breiman, this can be done by measuring the decrease in prediction accuracy when the values of a predictor are randomly permuted. Molnar [4] gives many examples of such "model-agnostic" interpretation methods. The concept of an interpretable model deserves to be discussed. In addition to logic or rule based models like decision trees, it is often considered that linear models are easily interpretable. Grömping [2] and Wallard [8] has shown that it is not the case: there exist more than 10 metrics for measuring variable importance in linear regression.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal-cnam.archives-ouvertes.fr/hal-02779513
Contributor : Gilbert Saporta <>
Submitted on : Thursday, June 4, 2020 - 5:04:04 PM
Last modification on : Friday, June 12, 2020 - 3:27:40 AM

Files

SMTDA2020-Saporta.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02779513, version 1

Collections

Citation

Gilbert Saporta. About Interpreting and Explaining Machine Learning and Statistical Models. SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Jun 2020, Barcelone, Spain. ⟨hal-02779513⟩

Share

Metrics

Record views

31

Files downloads

33