Skip to Main content Skip to Navigation
Conference papers

Model assessment

Abstract : In data mining and machine learning, models come from data and provide insights for understanding data (unsupervised classification) or making prediction (supervised learning) (Giudici, 2003, Hand, 2000). Thus the scientific status of this kind of models is different from the classical view where a model is a simplified representation of reality provided by an expert of the field. In most data mining applications a good model is a model which not only fits the data but gives good predictions, even if it is not interpretable (Vapnik, 2006). In this context, model validation and model choice need specific indices and approaches. Penalized likelihood measures (AIC, BIC etc.) may not be pertinent when there is no simple distributional assumption on the data and (or) for models like regularized regression, SVM and many others where parameters are constrained. Complexity measures like the VC-dimension are more adapted, but very difficult to estimate. In supervised classification, ROC curves and AUC are commonly used (Saporta & Niang, 2006). Comparing models should be done on validation (hold-out) sets but resampling is necessary in order to get confidence intervals.
Document type :
Conference papers
Complete list of metadatas

Cited literature [3 references]  Display  Hide  Download

https://hal-cnam.archives-ouvertes.fr/hal-02507614
Contributor : Gilbert Saporta <>
Submitted on : Friday, March 13, 2020 - 1:28:55 PM
Last modification on : Tuesday, March 17, 2020 - 1:18:20 AM
Long-term archiving on: : Sunday, June 14, 2020 - 1:57:10 PM

File

RC1059.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02507614, version 1

Collections

Citation

Gilbert Saporta, Ndèye Niang. Model assessment. KNEMO'06 Knowledge Extraction and Modeling, Jan 2006, Capri, Italy. ⟨hal-02507614⟩

Share

Metrics

Record views

20

Files downloads

15