Optimizing and scaling HPCG on tianhe-2: Early experience

CORC > 软件研究所 > 软件所图书馆 > 会议论文

	Optimizing and scaling HPCG on tianhe-2: Early experience
	Zhang, Xianyi (1) ; Yang, Chao (1) ; Liu, Fangfang (1) ; Liu, Yiqun (1) ; Lu, Yutong (4)
	2014
会议名称	14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014
会议日期	August 24, 2014 - August 27, 2014
会议地点	Dalian, China
页码	28-41
中文摘要	In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
英文摘要	In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
收录类别	EI
会议录出版地	Springer Verlag
语种	英语
ISSN号	3029743
ISBN号	9783319111964
内容类型	会议论文
源URL	[http://ir.iscas.ac.cn/handle/311060/16618]
专题	软件研究所_软件所图书馆_会议论文
推荐引用方式 GB/T 7714	Zhang, Xianyi ,Yang, Chao ,Liu, Fangfang ,et al. Optimizing and scaling HPCG on tianhe-2: Early experience[C]. 见:14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014. Dalian, China. August 24, 2014 - August 27, 2014.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们