学术会议 - 中国人民大学统计学院

学术会议

您当前的位置：首页> 学术会议

20161231 Identication of Pairwise Informative Features for Clustering Data with Growing Dimensions

时间：2016-12-17

Abstract: Identifying important features for separating unlabeled observations into homogeneous groups plays a critical role in dimension reduction and modeling data with complex structures. This problem is directly related to selecting informative variables in cluster analysis, where a small fraction of features is identified for separating observed feature vectors $\bX_i\in \mathbb{R}^p$, $i=1,\ldots,n$, into $K$ possible classes. Utilizing the framework of model-based clustering, we introduce the {\bf PA}irwise {\bf R}eciprocal fu{\bf SE} (PARSE) procedure based on a new class of penalization functions that imposes infinite penalties on features with small differences across clusters. PARSE effectively avoids selecting overly dense number of features for separating observations in cluster analysis. We establish the consistency of the proposed procedure for identifying informative features for cluster analysis. The PARSE procedure is shown to enjoy certain optimality properties as well. We develop a backward selection algorithm, in conjunction with the EM algorithm, to implement PARSE. Simulation studies show that PARSE has competitive performance compared to other popular model-based clustering methods. PARSE is shown to select a sparse set of features and to produce accurate clustering results. We apply PARSE to a microarray experiment on human asthma and discuss the biological implications of the results.

Wen Zhou
Assistant Professor of Statistics
Department of Statistics

Colorado State University

2016年12月23日下午2:00-3:00 1030会议室

Short Bio: Wen Zhou is an Assistant Professor in the Department of Statistics at Colorado State University. He obtained his Ph.D. degrees in Applied Mathematics and Statistics at Iowa State University in 2010 and 2014. Dr. Zhou’s research mainly focuses on developing computational methods, statistical models and inference procedures to study data of high-dimensionalities from genomic and biomedical studies. Dr. Zhou has experience on building theoretically justified statistical models and procedures for analyzing different types of omics data to draw biologically critical insights. He has developed inference procedures for different statistical problems for high-dimensional data, including testing for the structures of high-dimensional covariance matrix, comparing large covariance matrices with complex unknown structures and a novel gene clustering algorithm; testing high-dimensional mean vectors with unknown complex dependency; identification of pairwise informative features for clustering data with growing dimensions; and detection of spurious discoveries in genomic studies using a nonparametric procedure.