在多维数据多变点检测任务中,目前绝大多数学者关注的是同质性数据问题,即假设变点同时发生在所有数据维度。在此研究中,我们提出了S-MCPD算法以解决更有实际应用价值的异质性数据变点识别问题,该算法不仅能够准确识别变点位置及发生变点的特定维度,还能够在方差变化的情况下保持稳定,在两个实际数据集上均表现优秀。
论文题目:Multivariate change point detection for heterogeneous series
论文摘要:The multivariate change point detection problem has been encountered across various fields. Most approaches to this problem assume the series is homogeneous, i.e., all the coordinates change concurrently. Hence, the specific subset of the coordinates containing the change points cannot be determined. In this work, we propose S-MCPD, which is capable of detecting the position of multivariate change points for heterogeneous series by identifying specific coordinates of those changed. Specifically, the problem is discussed in the context of variable selection and transformed into the form of sparse group lasso. In simulation studies, we compared S-MCPD with four existing methods, inspect, sbs, dc, and cpm. The results showed that the performance of S-MCPD was comparable to that of inspect and was superior to other methods in terms of evaluation metrics. In addition, S-MCPD can determine not only the positions of change points, but also the subset of coordinates containing the change points, while other existing methods are unable to achieve this. Moreover, S-MCPD does not depend on the constant variance assumption and works quite well even when the covariance changes, which makes our method more practical. These are the two important contributions of our work. Finally, we applied S-MCPD to two real-world datasets to show its effectiveness.
作者介绍:
吕晓玲,中国人民大学统计学院教授、应用统计科学研究中心研究员。研究方向:统计学习、消费者行为分析、文本分析。
郭昱璇,中国人民大学统计学院在读博士生,主要研究方向为评论文本挖掘等。
高鸣,本科就读于中国人民大学统计学院,博士就读于芝加哥大学,主要研究方向为图模型、因果推断及资产定价等。
论文发表截图: