Which analytic methods for Big Data? - Cnam - Conservatoire national des arts et métiers Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Which analytic methods for Big Data?

Résumé

Classical inference is not fitted for massive data: statistical tests reject any reasonable model, confidence intervals are reduced to nothing . Model validation should be done through cross validation or split sampling. Explicit, parcimonious generative models are replaced by predictive algorithms. Model choice is driven by statistical learning theory and not by penalized likelihood. The analyst’s toolbox includes revisited classical data analysis techniques (PCA, MCA as particular cases of SVD, clustering) mainly for exploratory purposes as well as machine learning methods (SVM, boosting, ensemble learning) for prediction. In the case of high dimensional data where the number of variables exceeds the number of units, sparse methods based on L1 regularization provide elegant and simple solutions; we will present a sparse generalization of multiple correspondence analysis. Is the data deluge making the scientific method obsolete, as C.Anderson claimed some years ago? We will conclude by some comments on correlation and causality.

Mots clés

ARS15_saporta.pdf (1003.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03953110 , version 1 (03-03-2023)

Identifiants

  • HAL Id : hal-03953110 , version 1

Citer

Gilbert Saporta. Which analytic methods for Big Data?. ARS'15. Fifth International Workshop on Social Network Analysis, Università degli Studi di Salerno, Apr 2015, Capri, Italy. ⟨hal-03953110⟩
4 Consultations
7 Téléchargements

Partager

Gmail Facebook X LinkedIn More