Skip to Main content Skip to Navigation
Conference papers

Sparse Methods for Unsupervised Data Analysis

Gilbert Saporta 1 Ruiping Liu 2 Ndeye Niang Keita 1 Huiwen Wang 2 
1 CEDRIC - MSDMA - CEDRIC. Méthodes statistiques de data-mining et apprentissage
CEDRIC - Centre d'études et de recherche en informatique et communications
Abstract : Principal Components Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) are among the most efficient techniques for visualizing and exploring numerical and categorical data in an unsupervised way. However, in the case of high-dimensional data, the interpretation of linear combinations of hundreds or thousands of variables becomes very difficult. The objective of sparse methods is to obtain pseudo-components which are linear combinations of only a small number of variables, and thus to facilitate interpretation by highlighting only the most important features. This simplification is achieved at the cost of the loss of characteristic properties like the orthogonality of the components and of the loadings. This explains why there are more than 20 variants of sparse PCA. In contrast, sparsifying correspondence analysis has received little or no attention in the literature, except for MCA. After a brief survey of sparse PCA, we will focus in sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices. We use the fact that CA is both a PCA (or a weighted SVD) and a canonical analysis, in order to develop column sparse (or row sparse) CA and a doubly sparse CA for rows and columns.
Document type :
Conference papers
Complete list of metadata
Contributor : Gilbert Saporta Connect in order to contact the contributor
Submitted on : Wednesday, December 9, 2020 - 11:02:24 AM
Last modification on : Wednesday, September 28, 2022 - 5:53:31 AM


  • HAL Id : hal-02471316, version 1



Gilbert Saporta, Ruiping Liu, Ndeye Niang Keita, Huiwen Wang. Sparse Methods for Unsupervised Data Analysis. The 4th International Symposium on Interval Data Modelling (SIDM 2019), Jun 2019, Pékin, China. ⟨hal-02471316⟩



Record views


Files downloads