About Interpreting and Explaining Machine Learning and Statistical Models - Cnam - Conservatoire national des arts et métiers Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

About Interpreting and Explaining Machine Learning and Statistical Models

Résumé

The use of black-box models for decisions affecting citizens is a hot topic of debate. Some authors like Rudin [5] are in favour of stopping the use of machine learning models and going back to models which are interpretable by design. We will focus in this communication on the statistical aspects, leaving aside ethics, despite its importance. The dilemma between explaining and predicting has been addressed by Breiman [1], Saporta [6] and Shmueli [5] among others. First of all, it seems to us necessary to distinguish between explicability: how does the model work, is the algorithm auditable? and interpretability: what are the important variables and values that may change the decision? A first approach to make black-boxes interpretable is by performing some post-processing. The idea is to plug in an interpretable model to its outputs, eg try to get close predictions by a decision tree or a linear model. One example is given in Liberati et al. [3] where a non-linear SVM is approximately reconstructed by a linear classifier, with a small loss of efficiency. A second approach is to derive variable importance measures by some kind of sensitivity analysis: following Breiman, this can be done by measuring the decrease in prediction accuracy when the values of a predictor are randomly permuted. Molnar [4] gives many examples of such "model-agnostic" interpretation methods. The concept of an interpretable model deserves to be discussed. In addition to logic or rule based models like decision trees, it is often considered that linear models are easily interpretable. Grömping [2] and Wallard [8] has shown that it is not the case: there exist more than 10 metrics for measuring variable importance in linear regression.
SMTDA2020-Saporta.pdf (62.89 Ko) Télécharger le fichier
SaportaSMTDA2020.pdf (2.19 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02779513 , version 1 (04-06-2020)

Identifiants

  • HAL Id : hal-02779513 , version 1

Citer

Gilbert Saporta. About Interpreting and Explaining Machine Learning and Statistical Models. SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Jun 2020, Barcelone (virtual), Spain. ⟨hal-02779513⟩
181 Consultations
343 Téléchargements

Partager

Gmail Facebook X LinkedIn More