CORC  > 软件研究所  > 软件所图书馆  > 期刊论文
parallelization and performance optimization on face detection algorithm with opencl: a case study
Wang Weiyan ; Zhang Yunquan ; Yan Shengen ; Zhang Ying ; Jia Haipeng
刊名Tsinghua Science and Technology
2012
卷号17期号:3页码:287-295
关键词Algorithms Optimization
ISSN号1007-0214
中文摘要Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very nave implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU. © 2012 Tsinghua University Press.
英文摘要Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very nave implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU. © 2012 Tsinghua University Press.
收录类别EI
语种英语
公开日期2013-09-17
内容类型期刊论文
源URL[http://ir.iscas.ac.cn/handle/311060/15016]  
专题软件研究所_软件所图书馆_期刊论文
推荐引用方式
GB/T 7714
Wang Weiyan,Zhang Yunquan,Yan Shengen,et al. parallelization and performance optimization on face detection algorithm with opencl: a case study[J]. Tsinghua Science and Technology,2012,17(3):287-295.
APA Wang Weiyan,Zhang Yunquan,Yan Shengen,Zhang Ying,&Jia Haipeng.(2012).parallelization and performance optimization on face detection algorithm with opencl: a case study.Tsinghua Science and Technology,17(3),287-295.
MLA Wang Weiyan,et al."parallelization and performance optimization on face detection algorithm with opencl: a case study".Tsinghua Science and Technology 17.3(2012):287-295.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace