About Interpreting and Explaining Machine Learning and Statistical Models

Gilbert Saporta

Communication Dans Un Congrès Année : 2020

About Interpreting and Explaining Machine Learning and Statistical Models

(1)

Gilbert Saporta

Fonction : Auteur
PersonId : 180161
IdHAL : gilbert-saporta
ORCID : 0000-0002-3406-5887
IdRef : 027122565

CEDRIC. Méthodes statistiques de data-mining et apprentissage

Résumé

The use of black-box models for decisions affecting citizens is a hot topic of debate. Some authors like Rudin [5] are in favour of stopping the use of machine learning models and going back to models which are interpretable by design. We will focus in this communication on the statistical aspects, leaving aside ethics, despite its importance. The dilemma between explaining and predicting has been addressed by Breiman [1], Saporta [6] and Shmueli [5] among others. First of all, it seems to us necessary to distinguish between explicability: how does the model work, is the algorithm auditable? and interpretability: what are the important variables and values that may change the decision? A first approach to make black-boxes interpretable is by performing some post-processing. The idea is to plug in an interpretable model to its outputs, eg try to get close predictions by a decision tree or a linear model. One example is given in Liberati et al. [3] where a non-linear SVM is approximately reconstructed by a linear classifier, with a small loss of efficiency. A second approach is to derive variable importance measures by some kind of sensitivity analysis: following Breiman, this can be done by measuring the decrease in prediction accuracy when the values of a predictor are randomly permuted. Molnar [4] gives many examples of such "model-agnostic" interpretation methods. The concept of an interpretable model deserves to be discussed. In addition to logic or rule based models like decision trees, it is often considered that linear models are easily interpretable. Grömping [2] and Wallard [8] has shown that it is not the case: there exist more than 10 metrics for measuring variable importance in linear regression.

Mots clés

Machine learning Interpretability Model-agnostic Importance measures Post-processing

Domaines

Statistiques [stat] Machine Learning [stat.ML]

SMTDA2020-Saporta.pdf (62.89 Ko)

SaportaSMTDA2020.pdf (2.19 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Gilbert Saporta : Connectez-vous pour contacter le contributeur

https://cnam.hal.science/hal-02779513

Soumis le : jeudi 4 juin 2020-17:04:04

Dernière modification le : mercredi 28 septembre 2022-05:57:06

Dates et versions

hal-02779513 , version 1 (04-06-2020)

Identifiants

HAL Id : hal-02779513 , version 1

Citer

Gilbert Saporta. About Interpreting and Explaining Machine Learning and Statistical Models. SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Jun 2020, Barcelone (virtual), Spain. ⟨hal-02779513⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNAM CEDRIC-CNAM HESAM

181 Consultations

343 Téléchargements

About Interpreting and Explaining Machine Learning and Statistical Models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager