Let web spammers expose themselves

doi:10.1145/1935826.1935902

CORC > 北京大学 > 软件与微电子学院

	Let web spammers expose themselves
	Cheng, Zhicong ; Gao, Bin ; Sun, Congkai ; Jiang, Yanbing ; Liu, Tie-Yan
	2011
英文摘要	This paper is concerned with mining link spams (e.g., link farm and link exchange) from search engine optimization (SEO) forums. To provide quality services, it is critical for search engines to address web spam. Several techniques such as TrustRank, BadRank, and SpamRank have been proposed for this purpose. Most of these methods try to downgrade the effects of the spam websites by identifying specific link patterns of them. However, spam websites have appeared to be more and more similar to normal or even good websites in their link structures, by reforming their spam techniques. As a result, it is very challenging to automatically detect link spams from the Web graph. In this paper, we propose a different approach, which detects link spams by looking at how web spammers make link spam happen. We find that web spammers usually ally with each other, and SEO forum is one of the major means for them to form the alliance. We therefore propose mining suspicious link spams directly from the posts in the SEO forums. However, the task is non-trivial because there are also other information and even noises contained in these posts, in addition to useful clues of link spam. To tackle the challenges, we first extract all the URLs contained in the posts of the SEO forums. Second, we extract features for the URLs from their relationships with forum users (potential spammers) and from their link structure in the web graph. Third, we build a semi-supervised learning framework to calculate the spam scores for the URLs, which encodes several heuristics such as spam websites usually linking to each other, and good websites seldom linking to spam websites. We tested our approach on seven major SEO forums. A lot of spam websites were identified, a significant proportion of which cannot be detected by conventional anti-spam methods. It indicates that the proposed approach can be a good complement of existing anti-spam techniques. Copyright 2011 ACM.; EI; 0
语种	英语
DOI标识	10.1145/1935826.1935902
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/325822]
专题	软件与微电子学院
推荐引用方式 GB/T 7714	Cheng, Zhicong,Gao, Bin,Sun, Congkai,et al. Let web spammers expose themselves. 2011-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们