我院王菲菲副教授在《Information Sciences》发表论文。该研究提出了一种新的融合作者信息的主题模型,通过建立“原文”和“用户评论”之间的关系,在解决短文档建模稀疏性问题的同时挖掘用户评论的偏好性。
论文题目
Author topic model for co-occurring normal documents and short texts to explore individual user preferences
文章摘要
The investigation of user preferences through user comments has attracted significant attention. Although topic models have been verified as useful tools to facilitate the understanding of textual contents, they cannot be directly applied to accomplish this task because of two problems. The first problem is the severe data sparsity suffered by user comments because they are generally short. The second problem is the mixture of opinions from both user comments and the original documents the users commented on. To simultaneously solve the data sparsity problem and explore clean user preferences, we propose an author co-occurring topic model (AOTM) for normal documents and their short user comments. By considering authorship, AOTM allows each author of short texts to have a probability distribution over a set of topics represented only short texts. Accordingly, the individual user preferences can be investigated based on these author-level distributions. We verify the performance of AOTM using two news article datasets and one e-commerce dataset. Extensive experiments demonstrate that the AOTM outperforms several state-of-the-art methods in topic learning and topic representation of documents. The potential usage of AOTM in exploring individual user preferences is further illustrated by drawing user portraits and predicting user posting behaviors.
作者介绍
王菲菲,中国人民大学统计学院副教授,北京大学光华管理学院统计学博士。研究上关注文本挖掘及其商业应用、社交网络分析、大数据建模等,研究论文发表于Journal of Econometric, Journal of Business and Econometric Statistics, Journal of Machine Learning Research, 中国科学(数学)等国内外高水平期刊上。主持并参与了国家自科基金项目、教育部社科重大项目、国家重点研发项目等多个课题。曾获中国人民大学教师青年基本功大赛二等奖和线上教学优秀奖。
发表页面