An automatic performance model-based scheduling tool for coupled climate system models
Ding, Nan1,2,3; Xue, Wei1,2,3,4,6; Song, Zhenya2,5; Fu, Haohuan2,3,4,6; Xu, Shiming3; Zheng, Weimin1
刊名Journal of Parallel and Distributed Computing
2018
卷号132页码:1-13
关键词Automatic tools Branch-and-bound algorithms Coupled climate systems Multiphysics simulations Performance improvements Performance Model Simulation configuration Time-to-solution
ISSN号07437315
DOI10.1016/j.jpdc.2018.01.002
英文摘要

The prediction ability of the climate system is highly depended on the efficient integration of observations and simulations of the Earth, which is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper first constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficient scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144-480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements also could be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model. © 2018 Elsevier Inc.

资助项目Basic Scientific Fund for National Public Research Institute of China[2016S03]
WOS研究方向Computer Science
语种英语
出版者Academic Press Inc.
WOS记录号WOS:000476580400017
内容类型期刊论文
源URL[http://ir.fio.com.cn/handle/2SI8HI0U/6426]  
专题业务部门_海洋环境与数值模拟研究室
作者单位1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;;
2.Laboratory for Regional Oceanography and Numerical Modeling, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China;;
3.Ministry of Education Key Laboratory for Earth System modeling, Department of Earth System Science, Tsinghua University, Beijing 100084, China;;
4.National Supercomputing Center in Wuxi, Wuxi 214072, China;;
5.First Institute of Oceanography, State Oceanic Administration, Qingdao 266061, China;;
6.Joint Center for Global Change Studies, Beijing 100875, China
推荐引用方式
GB/T 7714
Ding, Nan,Xue, Wei,Song, Zhenya,et al. An automatic performance model-based scheduling tool for coupled climate system models[J]. Journal of Parallel and Distributed Computing,2018,132:1-13.
APA Ding, Nan,Xue, Wei,Song, Zhenya,Fu, Haohuan,Xu, Shiming,&Zheng, Weimin.(2018).An automatic performance model-based scheduling tool for coupled climate system models.Journal of Parallel and Distributed Computing,132,1-13.
MLA Ding, Nan,et al."An automatic performance model-based scheduling tool for coupled climate system models".Journal of Parallel and Distributed Computing 132(2018):1-13.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace