Sparse Methods for Unsupervised Data Analysis - Archive ouverte HAL Access content directly
Conference Papers Year :

Sparse Methods for Unsupervised Data Analysis

(1) , (2) , (1) , (2)
1
2

Abstract

Principal Components Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) are among the most efficient techniques for visualizing and exploring numerical and categorical data in an unsupervised way. However, in the case of high-dimensional data, the interpretation of linear combinations of hundreds or thousands of variables becomes very difficult. The objective of sparse methods is to obtain pseudo-components which are linear combinations of only a small number of variables, and thus to facilitate interpretation by highlighting only the most important features. This simplification is achieved at the cost of the loss of characteristic properties like the orthogonality of the components and of the loadings. This explains why there are more than 20 variants of sparse PCA. In contrast, sparsifying correspondence analysis has received little or no attention in the literature, except for MCA. After a brief survey of sparse PCA, we will focus in sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices. We use the fact that CA is both a PCA (or a weighted SVD) and a canonical analysis, in order to develop column sparse (or row sparse) CA and a doubly sparse CA for rows and columns.
Vignette du fichier
sparseSIDM.pdf (1.66 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-02471316 , version 1 (09-12-2020)

Identifiers

  • HAL Id : hal-02471316 , version 1

Cite

Gilbert Saporta, Ruiping Liu, Ndeye Niang Keita, Huiwen Wang. Sparse Methods for Unsupervised Data Analysis. The 4th International Symposium on Interval Data Modelling (SIDM 2019), Jun 2019, Pékin, China. ⟨hal-02471316⟩
76 View
20 Download

Share

Gmail Facebook Twitter LinkedIn More