CORC  > 北京大学  > 信息科学技术学院
Variable Length Concentration based Feature Construction Method for Spam Detection
Gao, Yang ; Mi, Guyue ; Tan, Ying
2015
关键词NETWORKS
英文摘要In the field of spam detection, concentration methods have been proposed for feature construction in recent years, which convert emails into fixed length feature vectors. This paper presents a novel method aiming to break through the limit of feature vector's length. Specifically, the method uses a fixed-length sliding window to divide each email into several sections. The number of sections depends on the length of each email. Consequently, length of feature vectors varies from each other and this paper names them variable length concentrations (VLC). This method can acquire adaptive feature vectors according to different lengths of emails. However, general classifiers are not suitable for this kind of feature vectors, because they are not able to handle fixed-length inputs. As a result, this paper applies recurrent neural networks (RNNs), whose inputs are not restricted by the length, to achieve spam detection. Recall, precision, accuracy and F-1 measure are taken to evaluate the method's performance. Experimental results on the classic corpora, PU1, PU2, PU3 and PUA, show that VLC performs significantly better than previously proposed methods, which provides support to the effectiveness of our method.; EI; CPCI-S(ISTP); gaoyang0115@pku.edu.cn; miguyue@pku.edu.cn; ytan@pku.edu.cn; 2015-September
语种中文
出处2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DOI标识10.1109/IJCNN.2015.7280346
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/436621]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Gao, Yang,Mi, Guyue,Tan, Ying. Variable Length Concentration based Feature Construction Method for Spam Detection. 2015-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace