Sparse Correspondence Analysis for Contingency Tables

Gilbert Saporta

Communication Dans Un Congrès Année : 2021

Sparse Correspondence Analysis for Contingency Tables

(1)

Gilbert Saporta

Fonction : Auteur
PersonId : 180161
IdHAL : gilbert-saporta
ORCID : 0000-0002-3406-5887
IdRef : 027122565

CEDRIC. Méthodes statistiques de data-mining et apprentissage

Résumé

Since the introduction of the lasso in regression, various sparse methods have been developed in an unsupervised context like sparse principal component analysis (s-PCA) and sparse singular value decomposition (s-SVD). One advantage of s-PCAis to simplify the interpretation of the (pseudo) principal components since each one isexpressed as a linear combination of a small number of variables. The disadvantages lie on the one hand in the difficulty of choosing the number of non-zero coefficients in the absence of a well established criterion and on the other hand in the loss of orthogonality for the components and/or the loadings. We propose s-CA, a sparse variant of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in textmining, together with pPMD, a projected deflation technique already used in s-PCA. Since CA is a double weighted PCA (for rows and columns) or a weighted SVD, we apply s-SVD in order to sparsify both rows and columns weights. The user may tune the level of sparsity of rows and columns and optimize it according to some criterium, and even decide that no sparsity is needed for rows (or columns) by relaxing one sparsity constraint.

Domaines

Statistiques [stat]

sparseCA_GSI-V3.pdf (1.75 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Gilbert Saporta : Connectez-vous pour contacter le contributeur

https://cnam.hal.science/hal-03183719

Soumis le : vendredi 27 mai 2022-09:02:11

Dernière modification le : mercredi 28 septembre 2022-05:54:04

Dates et versions

hal-03183719 , version 1 (27-05-2022)

Identifiants

HAL Id : hal-03183719 , version 1

Citer

Gilbert Saporta. Sparse Correspondence Analysis for Contingency Tables. Celebrating 40 years of Greek Statistical Institute 1981-2021, Greek Statistical Institute, Mar 2021, Athènes, Greece. ⟨hal-03183719⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNAM CEDRIC-CNAM HESAM

122 Consultations

31 Téléchargements

Sparse Correspondence Analysis for Contingency Tables

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager