李扬
中国人民大学统计学院教授、博士生导师,副院
长、统计咨询研究中心主任,国际统计学会推选
会员、国际生物统计学会中国分会青年理事、北
京生物医学统计与数据管理研究会监事长。主要
从事相关型数据分析, 模型选择与不确定性评价,
潜变量建模, 临床试验设计等领域研究,承担国家
自然科学基金面上项目、全国统计科学研究重大
项目等科研项目二十余项,发表Biometrics、
Biostatistics、Statistics in Medicine、
Statistical Methods in Medical Research、
统计研究、数理统计与管理等国内外期刊研究论
文五十余篇。
王钒
中国人民大学统计学博士生、美国密歇根大学硕
士,8年临床医学、流行病学统计建模与数据分析
经验,4年制药业生物统计分析经验。研究兴趣为
高维数据分析,函数型数据分析,临床试验与研
究。论文发表在Biostatistics, Statistical
Methods in Medical Research, JAMA Internal
Medicine等期刊。
论文题目
Integrative Functional Linear Model for Genome-Wide Association Studies with Multiple Traits
论文简介
统计学院教授李扬和博士生王钒参与的一项研究提出可分析函数化处理后SNP曲线的惩罚联合模型,通过信息借阅有效解决了GWAS数据中高维性和表型相关性情形下的特征选择问题。
英文摘要
In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited by analyzing each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as a functional object in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and also correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies show the satisfactory performance of the proposed method in identification and estimation of disease-associated genetic variants, compared to four alternatives. Analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.