Skip to Main content Skip to Navigation
Conference papers

Sparse Methods for Unsupervised Data Analysis

Abstract : Principal Components Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) are among the most efficient techniques for visualizing and exploring numerical and categorical data in an unsupervised way. However, in the case of high-dimensional data, the interpretation of linear combinations of hundreds or thousands of variables becomes very difficult. The objective of sparse methods is to obtain pseudo-components which are linear combinations of only a small number of variables, and thus to facilitate interpretation by highlighting only the most important features. This simplification is achieved at the cost of the loss of characteristic properties like the orthogonality of the components and of the loadings. This explains why there are more than 20 variants of sparse PCA. In contrast, sparsifying correspondence analysis has received little or no attention in the literature, except for MCA. After a brief survey of sparse PCA, we will focus in sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices. We use the fact that CA is both a PCA (or a weighted SVD) and a canonical analysis, in order to develop column sparse (or row sparse) CA and a doubly sparse CA for rows and columns.
Document type :
Conference papers
Complete list of metadatas

https://hal-cnam.archives-ouvertes.fr/hal-02471316
Contributor : Gilbert Saporta <>
Submitted on : Friday, February 7, 2020 - 11:15:08 PM
Last modification on : Wednesday, February 19, 2020 - 12:26:41 PM

Identifiers

  • HAL Id : hal-02471316, version 1

Collections

Citation

Gilbert Saporta, Ruiping Liu, Ndeye Niang Keita, Huiwen Wang. Sparse Methods for Unsupervised Data Analysis. The 4th International Symposium on Interval Data Modelling, Jun 2019, Beijing, China. ⟨hal-02471316⟩

Share

Metrics

Record views

43