Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data - Cnam - Conservatoire national des arts et métiers Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data

Résumé

We present a generalization of the partial least square regression (PLSR) approach—called Partial Least Squares Regression Correspondence Analysis (PLSRCA)—tailored to the analysis of categorical (and heterogeneous categorical and “bipolar”) data. Just like standard PLSR, PLSRCA first computes a pair of latent variables—which are linear combinations of the original variables—that have maximal covariance. The coefficients of these latent variables are obtained from the (generalized) singular value decomposition (equivalent to correspondence analysis of the matrix Y’X) of the matrix obtained by the product of the (properly centered and normalized) data matrices (this matrix has been called a “Band of Burt” by Lebart et al., 2006). The latent variables are obtained by projecting the original data matrices and as supplementary rows and columns in the analysis of the “Band of Burt” data table. This part—called PLS-CA, generalizes Tucker inter-battery analysis to categorical and mixed data—instantiates the correspondence analysis component of PLSRCA. The latent variable from the first matrix X is then used (after an adequate normalization) to predict the second matrix Y. The effect of the first latent variable is then partialled out (i.e., “deflated”) from both matrices. This part instantiates the partial least squares regression component of PLSRCA. The process of 1) extracting latent variables, 2) predicting both matrices from the latent variable, and 3) deflation, is carried out until a specific number of latent variables has been extracted or when the first matrix is completely decomposed. We illustrate PLSRCA with genetic data and show how single nucleotide polymorphisms (SNPs) can be used to predict a set of variables measuring cognitive impairment in Alzheimer’s Disease.
art_3411.pdf (1.07 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02507459 , version 1 (13-03-2020)

Identifiants

  • HAL Id : hal-02507459 , version 1

Citer

Hervé Abdi, Derek Beaton, Gilbert Saporta. Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data. 7th conference on Correspondence Analysis and Related Methods CARME 2015, Sep 2015, Napoli, Italy. ⟨hal-02507459⟩
58 Consultations
125 Téléchargements

Partager

Gmail Facebook X LinkedIn More