3.2 Regularization Methods for Categorical Predictors (Gerhard Tutz)

53:33
 
Partager
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on April 19, 2019 09:37 (2y ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 188707050 series 1600644
Par Universite Paris 1 Pantheon-Sorbonne, découvert par Player FM et notre communauté - Le copyright est détenu par l'éditeur, non par Player F, et l'audio est diffusé directement depuis ses serveurs. Appuyiez sur le bouton S'Abonner pour suivre les mises à jour sur Player FM, ou collez l'URL du flux dans d'autre applications de podcasts.
The majority of regularization methods in regression analysis has been designed for metric predictors and can not be used for categorical predictors. A rare exception is the group lasso which allows for categorical predictors or factors. We will consider alternative approaches based on penalized likelihood and boosting techniques. Typically the operating model will be a generalized linear model. We will start with ordered categorical predictors which unfortunately are often treated as metric variables because software is available. It is shown how difference penalties on adjacent dummy coefficients can be used to obtain smooth effect curves that can be estimated also in cases where simple maximum likelihood methods fail. The difference penalty turns out to be highly competitive when compared to methods often seen in practice, namely simple linear regression on the group labels and pure dummy coding. In a second step L1-penalty based methods that enforce variable selection and clustering of categories are presented and investigated. It is distinguished between ordered predictors where clustering refers to the fusion of adjacent categories and nominal predictors for which arbitrary categories can be fused. The methods allow to identify which categories do actually differ with respect to the dependent variable. Finally interaction effects are modeled within the framework of varying coefficients models. For the proposed methods properties of the estimators are investigated. Methods are illustrated and compared in simulation studies and applied to real world data.

12 episodes