Which analytic methods for Big Data?

Gilbert Saporta

Communication Dans Un Congrès Année : 2015

Which analytic methods for Big Data?

(1)

Gilbert Saporta

Fonction : Auteur
PersonId : 180161
IdHAL : gilbert-saporta
ORCID : 0000-0002-3406-5887
IdRef : 027122565

CEDRIC. Méthodes statistiques de data-mining et apprentissage

Résumé

Classical inference is not fitted for massive data: statistical tests reject any reasonable model, confidence intervals are reduced to nothing . Model validation should be done through cross validation or split sampling. Explicit, parcimonious generative models are replaced by predictive algorithms. Model choice is driven by statistical learning theory and not by penalized likelihood. The analyst’s toolbox includes revisited classical data analysis techniques (PCA, MCA as particular cases of SVD, clustering) mainly for exploratory purposes as well as machine learning methods (SVM, boosting, ensemble learning) for prediction. In the case of high dimensional data where the number of variables exceeds the number of units, sparse methods based on L1 regularization provide elegant and simple solutions; we will present a sparse generalization of multiple correspondence analysis. Is the data deluge making the scientific method obsolete, as C.Anderson claimed some years ago? We will conclude by some comments on correlation and causality.

Mots clés

Big Data Machine Learning SVD

Domaines

Statistiques [stat]

ARS15_saporta.pdf (1003.42 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilbert Saporta : Connectez-vous pour contacter le contributeur

https://cnam.hal.science/hal-03953110

Soumis le : vendredi 3 mars 2023-12:07:55

Dernière modification le : samedi 25 mars 2023-04:07:52

Dates et versions

hal-03953110 , version 1 (03-03-2023)

Identifiants

HAL Id : hal-03953110 , version 1

Citer

Gilbert Saporta. Which analytic methods for Big Data?. ARS'15. Fifth International Workshop on Social Network Analysis, Università degli Studi di Salerno, Apr 2015, Capri, Italy. ⟨hal-03953110⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNAM CEDRIC-CNAM HESAM

4 Consultations

7 Téléchargements

Which analytic methods for Big Data?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager