Statistical substring reduction in linear time

CORC > 北京大学 > 信息科学技术学院

	Statistical substring reduction in linear time
	Lu, XQ ; Le, Z ; Hu, JF
	2005
英文摘要	We study the problem of efficiently removing equal frequency n-gram substrings from an n-gram set, formally called Statistical Substring Reduction (SSR). SSR is a useful operation in corpus based multi-word unit research and new word identification task of oriental language processing. We present a new SSR algorithm that has linear time (O(n)) complexity, and prove its equivalence with the traditional O(n(2)) algorithm. In particular, using experimental results from several corpora with different sizes, we show that it is possible to achieve performance close to that theoretically predicated for this task. Even in a small corpus the new algorithm is several orders of magnitude faster than the 0(n 2) one. These results show that our algorithm is reliable and efficient, and is therefore an appropriate choice for large scale corpus processing.; Computer Science, Artificial Intelligence; SCI(E); CPCI-S(ISTP); 0
语种	英语
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/399744]
专题	信息科学技术学院
推荐引用方式 GB/T 7714	Lu, XQ,Le, Z,Hu, JF. Statistical substring reduction in linear time. 2005-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

暂无评论

评注功能仅针对注册用户开放，请您登录

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接