题名 | 面向Hadoop平台的数据密集型工作流系统的设计与实现 |
作者 | 李奇原 |
学位类别 | 硕士 |
答辩日期 | 2012-05-30 |
授予单位 | 中国科学院研究生院 |
授予地点 | 北京 |
导师 | 许舒人 |
关键词 | 数据密集型应用 Hadoop工作流 BPEL |
学位专业 | 计算机软件与理论 |
中文摘要 |
传统的工作流系统无法满足企业构建数据密集型应用的需求,需要借助于Hadoop平台处理大数据的能力。现有的Hadoop工作流系统采用自定义的描述语言构建Hadoop工作流,无法与企业已有的工作流系统通信,导致企业难以使用已有系统服务与Hadoop平台共同构建处理数据密集型应用的工作流。使用BPEL语言来构建Hadoop工作流既可以借助于传统工作流语言BPEL丰富表现能力、可以作为单个Web服务集成、支持长时间有状态的交互等优点,又可以使用Hadoop平台处理数据密集型应用的能力,是一个解决现有问题的有效手段.
|
英文摘要 | Enterprises have to build data-intensive applications on Hadoop platform for the limited data-process capability of traditional workflow system.Hadoop workflow systems build Hadoop workflows by user-defined language which makes it difficult to communicate and integrate with existing workflow systems. Building Hadoop workflows by BPEL not only can utilize the advantage of traditional description language BPEL, but also can utilize Hadoop platform’s capacity of processing data-intensive applications, so it’s an effective method to solve existing problems. This thesis analyzes the benefit of dealing with data-intensive applications using Hadoop workflows and some drawbacks of Hadoop workflow system which include inability of interacting with enterprise workflow systems, weak ability of description, lack of workflow-level scheduling and monitoring and so on. In order to solve those problems, this thesis proposes a Hadoop oriented data-intensive workflow system. This thesis focuses on rule-based model transformation, fair scheduling method and run-time monitoring for Hadoop workflow. In rule-based model transformation, this thesis designs mapping rules and the efficient framework of model transformation. In the fair scheduling method for Hadoop workflow, this thesis proposes fair scheduling method-FlowS. This method can not only provide the isolation of Hadoop workflow through workflow pools, but also assure the fairness of resource allocation through dynamic construction algorithm. In run-time monitoring, this thesis persistent Hadoop workflow model and updates view asynchronously to reduce the workload of presentation layer. In the meantime,a approach of creating a monitor instance for every active Hadoop workflow, thread pool is proposed, which identify failures and reduce the overhead of monitoring. In the end, the thesis discusses the design and implementation of Hadoop oriented data-intensive system. The research results above are applied. |
语种 | 英语 |
学科主题 | 软件工程 |
公开日期 | 2012-06-01 |
内容类型 | 学位论文 |
源URL | [http://ir.iscas.ac.cn/handle/311060/14496] |
专题 | 软件研究所_软件工程技术研究开发中心 _学位论文 |
推荐引用方式 GB/T 7714 | 李奇原. 面向Hadoop平台的数据密集型工作流系统的设计与实现[D]. 北京. 中国科学院研究生院. 2012. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论