报告时间:2018年12月17日 14:30-15:30
报告地点:明德主楼1016会议室
报告题目: CESME: Cluster Analysis with Latent Semiparametric Mixture Models
报告摘要:
Model-based clustering is one of the most popular statistical approaches for cluster analysis and has been widely applied in exploratory analyses. However, the the Gaussianity or Gaussianity-like distribution of data, a critical assumption for model-based clustering, is unlikely to be held in general which prevents successful clustering using the model-based method for data with complex distributions. In this paper, we propose a latent semiparametric mixture model to substantially improve model flexibility for clustering data with skewed or heavy distributions more efficiently. We assume that the observables are obtained from unknown monotone transformations of latent variables satisfying a Gaussian mixture distribution, which models the cluster structures. The identifiability of the proposed model is carefully justified. An alternating maximization procedure is developed to estimate the proposed model, whose convergence property is investigated. An appealing feature of our method is to exploit a novel one-step analysis in conjunction to the finite sample analysis to reveal a new theoretical guideline of the alternating maximization algorithm, which makes the algorithm computationally efficient. Beside the theoretical exploration, the proposed method is also numerically assessed through extensive simulations and has demonstrated superior performance compared to most of the contemporary competitors.
报告人简介:
Wen Zhou is an Assistant Professor in the Department of Statistics at Colorado State University. He obtained his Ph.D. degrees in Applied Mathematics and Statistics at Iowa State University in 2010 and 2014. Dr. Zhou’ s research mainly focuses on developing computational methods, statistical models and inference procedures to study data of high-dimensionalities from genomic and biomedical studies. Dr. Zhou has experience on building theoretically justified statistical models and procedures for analyzing different types of omics data to draw biologically critical insights. He has developed inference procedures for different statistical problems for high-dimensional data, including testing for the structures of high-dimensional covariance matrix, comparing large covariance matrices with complex unknown structures and a novel gene clustering algorithm; testing high-dimensional mean vectors with unknown complex dependency; identification of pairwise informative features for clustering data with growing dimensions; and detection of spurious discoveries in genomic studies using a nonparametric procedure.