Skip to Main content Skip to Navigation
Conference papers

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data

Abstract : We present a generalization of the partial least square regression (PLSR) approach—called Partial Least Squares Regression Correspondence Analysis (PLSRCA)—tailored to the analysis of categorical (and heterogeneous categorical and “bipolar”) data. Just like standard PLSR, PLSRCA first computes a pair of latent variables—which are linear combinations of the original variables—that have maximal covariance. The coefficients of these latent variables are obtained from the (generalized) singular value decomposition (equivalent to correspondence analysis of the matrix Y’X) of the matrix obtained by the product of the (properly centered and normalized) data matrices (this matrix has been called a “Band of Burt” by Lebart et al., 2006). The latent variables are obtained by projecting the original data matrices and as supplementary rows and columns in the analysis of the “Band of Burt” data table. This part—called PLS-CA, generalizes Tucker inter-battery analysis to categorical and mixed data—instantiates the correspondence analysis component of PLSRCA. The latent variable from the first matrix X is then used (after an adequate normalization) to predict the second matrix Y. The effect of the first latent variable is then partialled out (i.e., “deflated”) from both matrices. This part instantiates the partial least squares regression component of PLSRCA. The process of 1) extracting latent variables, 2) predicting both matrices from the latent variable, and 3) deflation, is carried out until a specific number of latent variables has been extracted or when the first matrix is completely decomposed. We illustrate PLSRCA with genetic data and show how single nucleotide polymorphisms (SNPs) can be used to predict a set of variables measuring cognitive impairment in Alzheimer’s Disease.
Document type :
Conference papers
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal-cnam.archives-ouvertes.fr/hal-02507459
Contributor : Gilbert Saporta <>
Submitted on : Friday, March 13, 2020 - 11:02:06 AM
Last modification on : Wednesday, March 25, 2020 - 11:40:11 AM

Identifiers

  • HAL Id : hal-02507459, version 1

Collections

Citation

Hervé Abdi, Derek Beaton, Gilbert Saporta. Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data. 7th conference on Correspondence Analysis and Related Methods CARME 2015, Sep 2015, Napoli, Italy. ⟨hal-02507459⟩

Share

Metrics

Record views

11

Files downloads

4