CORC  > 厦门大学  > 信息技术-已发表论文
Exploring the diversity in cluster ensemble generation: Random sampling and random projection
Yang, Fan ; Li, Xuan ; Li, Qianmu ; Li, Tao ; Yang F(杨帆)
刊名http://dx.doi.org/10.1016/j.eswa.2014.01.028
2014
关键词EVIDENCE ACCUMULATION CONSENSUS PARTITIONS ALGORITHMS FRAMEWORK
英文摘要Natural Science Foundation of China [61202144, 61203282]; Natural Science Foundation of Fujian Province [2012J05125]; Jiangsu 973 Scientific Project [BK2011023]; Key Laboratory of System Control and Information Processing, Ministry of Education of Shanghai Jiao Tong University [SCIP2012007]; National Natural Science Foundation of China [61272419]; US National Science Foundation [DBI-0850203, CNS-1126619, IIS-1213026]; U.S. Department of Homeland Security [2010-ST-062000039]; Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to generate different base partitions for cluster ensemble. Similar to ensemble classification, many studies have been focusing on generating different partitions of the original dataset, i.e., clustering on different subsets (e.g., obtained using random sampling) or clustering in different feature spaces (e.g., obtained using random projection). However, little attention has been paid to the diversity and quality of the partitions generated using these two approaches. In this paper, we propose a novel cluster generation method based on random sampling, which uses the nearest neighbor method to fill the category information of the missing samples (abbreviated as RS-NN). We evaluate its performance in comparison with k-means ensemble, a typical random projection method (Random Feature Subset, abbreviated as FS), and another random sampling method (Random Sampling based on Nearest Centroid, abbreviated as RS-NC). Experimental results indicate that the FS method always generates more diverse partitions while RS-NC method generates high-quality partitions. Our proposed method, RS-NN, generates base partitions with a good balance between the quality and the diversity and achieves significant improvement over alternative methods. Furthermore, to introduce more diversity, we propose a dual random sampling method which combines RS-NN and FS methods. The proposed method can achieve higher diversity with good quality on most datasets. (C) 2014 Elsevier Ltd. All rights reserved.
语种英语
出版者PERGAMON-ELSEVIER SCIENCE LTD
内容类型期刊论文
源URL[http://dspace.xmu.edu.cn/handle/2288/92666]  
专题信息技术-已发表论文
推荐引用方式
GB/T 7714
Yang, Fan,Li, Xuan,Li, Qianmu,et al. Exploring the diversity in cluster ensemble generation: Random sampling and random projection[J]. http://dx.doi.org/10.1016/j.eswa.2014.01.028,2014.
APA Yang, Fan,Li, Xuan,Li, Qianmu,Li, Tao,&杨帆.(2014).Exploring the diversity in cluster ensemble generation: Random sampling and random projection.http://dx.doi.org/10.1016/j.eswa.2014.01.028.
MLA Yang, Fan,et al."Exploring the diversity in cluster ensemble generation: Random sampling and random projection".http://dx.doi.org/10.1016/j.eswa.2014.01.028 (2014).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace