SF-Sketch: A Two-Stage Sketch for Data Streams
Liu, Lingtong1,2; Shen, Yulong1,2; Yan, Yibo3; Yang, Tong3; Shahzad, Muhammad4; Cui, Bin3; Xie, Gaogang5
刊名IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
2020-10-01
卷号31期号:10页码:2263-2276
关键词Distributed databases Monitoring Bars Frequency measurement Registers Fats Hash functions Network measurements sketch distributed monitoring multiset frequent items
ISSN号1045-9219
DOI10.1109/TPDS.2020.2987609
英文摘要Sketches are probabilistic data structures designed for recording frequencies of items in a multi-set. They are widely used in various fields, especially for gathering Internet statistics from distributed data streams in network measurements. In a distributed streaming application with high data rates, a sketch in each monitoring node "fills up" very quickly and then its content is transferred to a remote collector responsible for answering queries. Thus, the size of the contents transferred must be kept as small as possible while meeting the desired accuracy requirement. To obtain significantly higher accuracy while keeping the same update and query speed as the best prior sketches, in this article, we propose a new sketch - the Slim-Fat (SF) sketch. The key idea behind the SF-sketch is to maintain two separate sketches: a larger sketch, the Fat-subsketch, and a smaller sketch, the Slim-subsketch. The Fat-subsketch is used for updating and periodically producing the Slim-subsketch, which is then transferred to the remote collector for answering queries quickly and accurately. We also present the error bound as well as an accurate model of the correct rate of the SF-sketch, and verify their correctness through experiments. We implemented and extensively evaluated the SF-sketch along with several prior sketches. Our results show that when the size of our Slim-subsketch and of the widely used Count-Min (CM) sketch are kept the same, our SF-sketch outperforms the CM-sketch by up to 33.1 times in terms of accuracy (when the ratio of the sizes of the Fat-subsketch and the Slim-subsketch is 16:1). We have made all source codes publicly available at Github ["Source code of SF sketches," [Online]. Available: https://github.com/paper2017/SF-sketch].
资助项目National Key Research and Development Program of China[2018YFE0207600] ; National Key Research and Development Program of China[2018YFB2100403] ; NSFC[61672061] ; NSFC[U1736216] ; National Science Foundation of USA[CNS-1616317] ; National Science Foundation of USA[CNS-1616273]
WOS研究方向Computer Science ; Engineering
语种英语
出版者IEEE COMPUTER SOC
WOS记录号WOS:000536291500003
内容类型期刊论文
源URL[http://119.78.100.204/handle/2XEOYT63/15296]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Shen, Yulong; Yang, Tong
作者单位1.Xidian Univ, Shaanxi Key Lab Network & Syst Secur, Xian 710071, Peoples R China
2.Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
3.Peking Univ, Dept Comp & Sci, Beijing 100871, Peoples R China
4.North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
5.Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
推荐引用方式
GB/T 7714
Liu, Lingtong,Shen, Yulong,Yan, Yibo,et al. SF-Sketch: A Two-Stage Sketch for Data Streams[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(10):2263-2276.
APA Liu, Lingtong.,Shen, Yulong.,Yan, Yibo.,Yang, Tong.,Shahzad, Muhammad.,...&Xie, Gaogang.(2020).SF-Sketch: A Two-Stage Sketch for Data Streams.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(10),2263-2276.
MLA Liu, Lingtong,et al."SF-Sketch: A Two-Stage Sketch for Data Streams".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.10(2020):2263-2276.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace