Skip to Main content Skip to Navigation
Conference papers

Variable selection for multiply imputed data with penalized generalized estimating equations

Abstract : The generalized estimating equations (GEE) are a useful tool for marginal regression analysis with repeated measurements and longitudinal data. Penalized regressions, such as LASSO, have been extended to GEE to allow shrinkage and dimension reduction but can't handle missingness. Missing data as well as a large number of variables combined with small sample size are usual issues faced with longitudinal data. Multiple imputation is a popular tool for handling missing data and in particular, the MI-GEE can be used for inference. Although methods to handle missing data such as MI-GEE have been improved, variable selection for GEE has not been systematically processed to integrate missing data. The multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) proposes a consistent selection through the multiple-imputed datasets but cannot handle correlation among individual observations. We present MI-PGEE, a new multiple imputation-penalized generalized estimating equations as an extension of the MI-LASSO to be applied on longitudinal data. MI-PGEE applies the group LASSO penalty to the group of estimated regression coefficients of the same variable, across multiple-imputed datasets. Estimates are computed using a Local Quadratic Approach and an algorithm based on a modified Newton-Raphson method ; a new BIC-like criterion is presented in order to select the tuning parameter. MI-PGEE yields a consistent variable selection across multiple-imputed datasets, making this a selection method for longitudinal data able to manage missing data. We simulated different patterns of correlation structure between continuous as well as binary variables : our results demonstrate advantages of MI-PGEE compared to simple imputation PGEE such as mean imputation. The usefulness of the new method is illustrated by an application on osteoarthritis of the knee to identify important biomarkers and magnetic resonance imaging criteria that are associated with joint space width.
Document type :
Conference papers
Complete list of metadatas
Contributor : Gilbert Saporta <>
Submitted on : Friday, March 6, 2020 - 10:50:08 AM
Last modification on : Wednesday, March 18, 2020 - 3:54:51 PM


  • HAL Id : hal-02500597, version 1



Julia Geronimi, Gilbert Saporta. Variable selection for multiply imputed data with penalized generalized estimating equations. XXVIIIth International Biometric Conference, Jul 2016, Victoria, Canada. ⟨hal-02500597⟩



Record views