# 讲座信息

20161231 Identication of Pairwise Informative Features for Clustering Data with Growing Dimensions

Abstract: Identifying important features for separating unlabeled observations into homogeneous groups plays a critical role in dimension reduction and modeling data with complex structures. This problem is directly related to selecting informative variables in cluster analysis, where a small fraction of features is identified for separating observed feature vectors $\bX_i\in \mathbb{R}^p$, $i=1,\ldots,n$, into $K$ possible classes. Utilizing the framework of model-based clustering, we introduce the {\bf PA}irwise {\bf R}eciprocal fu{\bf SE} (PARSE) procedure based on a new class of penalization functions that imposes infinite penalties on features with small differences across clusters. PARSE effectively avoids selecting overly dense number of features for separating observations in cluster analysis. We establish the consistency of the proposed procedure for identifying informative features for cluster analysis. The PARSE procedure is shown to enjoy certain optimality properties as well. We develop a backward selection algorithm, in conjunction with the EM algorithm, to implement PARSE. Simulation studies show that PARSE has competitive performance compared to other popular model-based clustering methods. PARSE is shown to select a sparse set of features and to produce accurate clustering results. We apply PARSE to a microarray experiment on human asthma and discuss the biological implications of the results.

Wen Zhou
Assistant Professor of Statistics
Department of Statistics