Optimal subsample selection for massive logistic regression with distributed data | |
Zuo, Lulu2; Zhang, Haixiang2; Wang, HaiYing1; Sun, Liuquan3 | |
刊名 | COMPUTATIONAL STATISTICS |
2021-02-27 | |
页码 | 28 |
关键词 | Allocation size Big data Distributed and massive data Subsample estimator Subsampling probabilities |
ISSN号 | 0943-4062 |
DOI | 10.1007/s00180-021-01089-0 |
英文摘要 | With the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method. |
资助项目 | National Science Foundation (NSF), USA grant[DMS-1812013] ; National Natural Science Foundation of China[11771431] ; National Natural Science Foundation of China[11690015] ; National Natural Science Foundation of China[11926341] ; Key Laboratory of RCSDS, CAS[2008DP173182] |
WOS研究方向 | Mathematics |
语种 | 英语 |
出版者 | SPRINGER HEIDELBERG |
WOS记录号 | WOS:000622671900002 |
内容类型 | 期刊论文 |
源URL | [http://ir.amss.ac.cn/handle/2S8OKBNM/58236] |
专题 | 应用数学研究所 |
通讯作者 | Zhang, Haixiang |
作者单位 | 1.Univ Connecticut, Dept Stat, Mansfield, CT 06269 USA 2.Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China 3.Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,et al. Optimal subsample selection for massive logistic regression with distributed data[J]. COMPUTATIONAL STATISTICS,2021:28. |
APA | Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,&Sun, Liuquan.(2021).Optimal subsample selection for massive logistic regression with distributed data.COMPUTATIONAL STATISTICS,28. |
MLA | Zuo, Lulu,et al."Optimal subsample selection for massive logistic regression with distributed data".COMPUTATIONAL STATISTICS (2021):28. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论