题名制造系统生产调度和机器人学习智能研究
作者魏英姿
学位类别博士
答辩日期2005-01-28
授予单位中国科学院沈阳自动化研究所
授予地点中国科学院沈阳自动化研究所
导师赵明扬
关键词动态调度 强化学习 机器人 遗传算法 制造系统
其他题名Research on Learning Intelligence of Production Scheduling and Robot for Manufacturing System
学位专业机械电子工程
中文摘要生产调度是否合理有效以及机器人技术的应用对制造业的技术进步有着重要作用。生产调度属于组合优化NP难题,研究组合优化问题的解决办法,本文提出了贪心遗传算法。贪心遗传算法的思想是根据对局部知识的理解,将贪心策略引入GA的各个遗传操作中,根据对TSP组合优化典型问题的计算,贪心遗传算法能以较少的工作量得到令人满意的解,也为遗传算法运用于动态调度问题奠定基础。考虑调度问题的约束条件:软、硬约束,本文建立资源受限调度一般问题的数学模型,研究多种加工模式并存的资源受限单机动态调度问题,提出了满足约束条件的染色体编码方法以形成GA的初始种群,采用特征保持的遗传操作,如,一点、二点和均匀顺序交叉等,并对其特征保持性给予了证明,运用并行进化机理构造算法的动力学,计算结果显示,这种特征保持的并行GA算法有较好的计算效果和效率,完全可以满足动态调度的需要。作业车间动态调度是最一般的调度类型,模式驱动调度(PDS)是实现动态调度的有效方法。本文在PDS框架下,就问题进行了以下研究:(1)基于agent技术构建作业车间动态调度系统,采用一种新的分布式控制体系结构,通过agent之间的交互投标双向选择完成调度任务分配,提出了3种复合规则作为合同网的谈判策略。(2)提出了复合规则Q学习方法,定义调度过程的中间状态描述变量——紧迫度,并构建了一种精确评价动作好坏的回报函数形式,通过仿真试验验证了该算法的有效性。提高制造单元机器人智能水平对扩展制造系统的生产能力起着重要作用。本文给出机器人技能学习的概念;总结机器人学习的建模方法;总结演示学习和强化学习方式的研究概况;归纳出机器人技能学习目前研究的可行方向。机器人复杂技能强化学习是一类比较困难的学习问题,为此,本文研究各种措施以期解决该难题。考虑到回报函数对强化学习系统的关键性作用,设计了一种启发式回报函数形式,并对其最优策略不变性和Q值迭代收敛性给予了证明,将输入状态空间进行多尺度离散化,运用CMAC神经网络函数近似,实行多种行为选择策略、分层递阶的学习策略,并通过学习控制自行车的仿真试验验证了上述技能学习方案的有效性。本文在先进制造系统的学习智能所作研究,对先进制造系统的技术进步起到促进作用,不仅为问题提供了新方法和手段,也为智能学习理论拓展了应用领域。
索取号TP18/W59/2005
英文摘要Both effective production scheduling and application of robot technology are significant for the development of manufacturing system.Chapter 2 presents a novel greedy genetic algorithm (GGA) for a typical combination optimization problem, i.e., Traveling Salesman Problem (TSP). The main idea of GGA is to introduce the greedy selection into the genetic operations. This work shows how greedy policy and genetic algorithm can be usefully combined. Initial experiments demonstrats the basic promise of the approach. When solving the resource-constrained dynamic scheduling problem subject to constraints, a mathematic model is built up for it. The parallel genetic algorithm (PGA) with satisfaction of constraints is proposed. PGA adopts the permutation-based coding of activity sequence with satisfaction of priority requirements. Crossover operators are customized by the research project. It is proved that the crossover operators result in a precedence feasible offspring genotype if applied to precedence feasible parent individuals. Single-machine preemptive scheduling will improve the performance of scheduling system. Pattern driven scheduling (PDS) is an effective way to realize the dynamic scheduling. Under the framework of PDS, chapter 4 discusses the following problems: (1) Heterarchical scheduling architecture is adopted to solve the multi-agent cooperative problem by using interactive bidding mechanism. Negotiations among different agents formed a complete scheduling. The negotiation strategy of contract-net protocol is based on 3 composite rules. The interactive selection of agents is achieved by implementing these composite rules. (2) Composite rules selection using reinforcement learning (RL) is proposed to realize job-shop dynamic scheduling. An intermediate-state variable is defined, pressure, to describe the system feature and determine the state sequence of search space. The conception of jobs’ estimated mean lateness (EMLT) is used to determine the amount of reward or penalty. It is important for robots to enhance their intelligence in manufacturing cell. Chapter 5 introduces the conception of skill learning, summarizes the methods for modeling skills, and has an overview of research on learning by demonstration and reinforcement learning in this area. The direction for robots skill learning is also deduced. Complex skill learning using RL is a difficult problem, so effective strategies are introduced for solving it. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. Chapter 6 presents a form of heuristic reward function . Under a more general model of MDP, the policy invariance and convergence property of Q-value iteration are proved. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. Variable resolution discretization of input space is introduced to improve the generalization capability of CMAC-based RL. Boltzmann distribution selection is also introduced into ε-greedy search procedure to decrease the unnecessary randomization. An example illustrates the utility of method for learning skilled robot control on line. The research work on learning intelligence of manufacturing system accelerates the advance of manufacturing technology. The dissertation not only put forward new methods for manufacturing system, but also explores learning intelligence theory to new fields.
语种中文
产权排序1
公开日期2012-08-29
分类号TP18
内容类型学位论文
源URL[http://ir.sia.ac.cn/handle/173321/9436]  
专题沈阳自动化研究所_装备制造技术研究室
推荐引用方式
GB/T 7714
魏英姿. 制造系统生产调度和机器人学习智能研究[D]. 中国科学院沈阳自动化研究所. 中国科学院沈阳自动化研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace