目前,针对广义线性模型下的缺失数据变量选择问题仍面临诸多挑战。本文通过广义估计方程将多重插补数据集间的相关性纳入至带有惩罚的变量选择模型,提出了基于多重插补数据的变量选择方法(PEE-MI)。本文证明了该方法具备变量选择的一致性和有效性,数值模拟和实证分析显示出该方法相较于已有方法具有优良的表现。
论文题目:
Penalized Estimating Equations for Generalized Linear Models with Multiple Imputation
英文摘要:
Missing values among variables present a challenge in variable selection in the generalized linear model. Common strategies that delete observations with missing information may cause serious information loss. Multiple imputation has been widely used in recent years because it provides unbiased statistical results given a correctly specified imputation model and considers the uncertainty of the missing data. However, variable selection methods in the generalized linear model with multiply imputed data have not yet been studied widely. In this study, we introduce penalized estimating equations for generalized linear models with multiple imputation (PEE–MI), which incorporates the correlation of multiple imputed observations into the objective function. The theoretical performance of the proposed PEE–MI depends on the penalized function adopted. We use the adaptive least absolute shrinkage and selection operator (adaptive LASSO) as an illustrating example. Simulations show that PEE–MI outperforms the alternatives. The proposed method is shown to select variables with clinical relevance when applied to a database of laboratory-diagnosed A/H7N9 patients in Zhejiang province, China.
作者介绍:
李扬,中国人民大学统计学院教授。
杨昊宇,中国人民大学统计学博士生,研究兴趣包括试验设计方法、因果效应估计、网络数据分析,研究成果发表在Statistica Sinica、CSDA、统计研究、系统科学与数学等期刊。
俞昊辰,中国人民大学统计学院硕士毕业生,现就职于中国农业银行总行。
黄瀚文,美国佐治亚大学流行病与生物统计系副教授。
沈晔,美国佐治亚大学流行病与生物统计系副教授
论文发表截图: