Model assessment

Gilbert Saporta; Ndèye Niang

Communication Dans Un Congrès Année : 2006

Model assessment

(1) , (2)

1
2

Gilbert Saporta

Fonction : Auteur
PersonId : 180161
IdHAL : gilbert-saporta
ORCID : 0000-0002-3406-5887
IdRef : 027122565

CEDRIC. Méthodes statistiques de data-mining et apprentissage

Ndèye Niang

Fonction : Auteur
PersonId : 182344
IdHAL : ndeye-niang
ORCID : 0000-0002-6109-9935
IdRef : 179489879

Centre d'études et de recherche en informatique et communications

Résumé

In data mining and machine learning, models come from data and provide insights for understanding data (unsupervised classification) or making prediction (supervised learning) (Giudici, 2003, Hand, 2000). Thus the scientific status of this kind of models is different from the classical view where a model is a simplified representation of reality provided by an expert of the field. In most data mining applications a good model is a model which not only fits the data but gives good predictions, even if it is not interpretable (Vapnik, 2006). In this context, model validation and model choice need specific indices and approaches. Penalized likelihood measures (AIC, BIC etc.) may not be pertinent when there is no simple distributional assumption on the data and (or) for models like regularized regression, SVM and many others where parameters are constrained. Complexity measures like the VC-dimension are more adapted, but very difficult to estimate. In supervised classification, ROC curves and AUC are commonly used (Saporta & Niang, 2006). Comparing models should be done on validation (hold-out) sets but resampling is necessary in order to get confidence intervals.

Domaines

Statistiques [stat]

Fichier principal

RC1059.pdf (262.89 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilbert Saporta : Connectez-vous pour contacter le contributeur

https://cnam.hal.science/hal-02507614

Soumis le : vendredi 13 mars 2020-13:28:55

Dernière modification le : vendredi 5 août 2022-14:54:00

Archivage à long terme le : dimanche 14 juin 2020-13:57:10

Dates et versions

hal-02507614 , version 1 (13-03-2020)

Identifiants

HAL Id : hal-02507614 , version 1

Citer

Gilbert Saporta, Ndèye Niang. Model assessment. KNEMO'06 Knowledge Extraction and Modeling, Jan 2006, Capri, Italy. ⟨hal-02507614⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNAM CEDRIC-CNAM

36 Consultations

108 Téléchargements

Model assessment

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager