Exploring the diversity in cluster ensemble generation: Random sampling and random projection | |
Yang, Fan ; Li, Xuan ; Li, Qianmu ; Li, Tao ; Yang F(杨帆) | |
刊名 | http://dx.doi.org/10.1016/j.eswa.2014.01.028
![]() |
2014 | |
关键词 | EVIDENCE ACCUMULATION CONSENSUS PARTITIONS ALGORITHMS FRAMEWORK |
英文摘要 | Natural Science Foundation of China [61202144, 61203282]; Natural Science Foundation of Fujian Province [2012J05125]; Jiangsu 973 Scientific Project [BK2011023]; Key Laboratory of System Control and Information Processing, Ministry of Education of Shanghai Jiao Tong University [SCIP2012007]; National Natural Science Foundation of China [61272419]; US National Science Foundation [DBI-0850203, CNS-1126619, IIS-1213026]; U.S. Department of Homeland Security [2010-ST-062000039]; Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to generate different base partitions for cluster ensemble. Similar to ensemble classification, many studies have been focusing on generating different partitions of the original dataset, i.e., clustering on different subsets (e.g., obtained using random sampling) or clustering in different feature spaces (e.g., obtained using random projection). However, little attention has been paid to the diversity and quality of the partitions generated using these two approaches. In this paper, we propose a novel cluster generation method based on random sampling, which uses the nearest neighbor method to fill the category information of the missing samples (abbreviated as RS-NN). We evaluate its performance in comparison with k-means ensemble, a typical random projection method (Random Feature Subset, abbreviated as FS), and another random sampling method (Random Sampling based on Nearest Centroid, abbreviated as RS-NC). Experimental results indicate that the FS method always generates more diverse partitions while RS-NC method generates high-quality partitions. Our proposed method, RS-NN, generates base partitions with a good balance between the quality and the diversity and achieves significant improvement over alternative methods. Furthermore, to introduce more diversity, we propose a dual random sampling method which combines RS-NN and FS methods. The proposed method can achieve higher diversity with good quality on most datasets. (C) 2014 Elsevier Ltd. All rights reserved. |
语种 | 英语 |
出版者 | PERGAMON-ELSEVIER SCIENCE LTD |
内容类型 | 期刊论文 |
源URL | [http://dspace.xmu.edu.cn/handle/2288/92666] ![]() |
专题 | 信息技术-已发表论文 |
推荐引用方式 GB/T 7714 | Yang, Fan,Li, Xuan,Li, Qianmu,et al. Exploring the diversity in cluster ensemble generation: Random sampling and random projection[J]. http://dx.doi.org/10.1016/j.eswa.2014.01.028,2014. |
APA | Yang, Fan,Li, Xuan,Li, Qianmu,Li, Tao,&杨帆.(2014).Exploring the diversity in cluster ensemble generation: Random sampling and random projection.http://dx.doi.org/10.1016/j.eswa.2014.01.028. |
MLA | Yang, Fan,et al."Exploring the diversity in cluster ensemble generation: Random sampling and random projection".http://dx.doi.org/10.1016/j.eswa.2014.01.028 (2014). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论