浏览全部资源
扫码关注微信
1. 苏州大学计算机科学与技术学院,江苏 苏州215006
2. 吉林大学符号计算与知识工程教育部重点实验室,吉林 长春130012
[ "朱斐(1978-),男,江苏苏州人,博士,苏州大学副教授,主要研究方向为机器学习、人工智能、生物信息学等。" ]
[ "许志鹏(1991-),男,湖北荆州人,苏州大学硕士生,主要研究方向为强化学习、人工智能等。" ]
[ "刘全(1969-),男,内蒙古牙克石人,博士后,苏州大学教授、博士生导师,主要研究方向为多强化学习、人工智能、自动推理等。" ]
[ "伏玉琛(1968-),男,江苏徐州人,博士,苏州大学教授、硕士生导师,主要研究方向为强化学习、人工智能等。" ]
[ "王辉(1968-),男,陕西西安人,苏州大学讲师,主要研究方向为强化学习、人工智能等。" ]
网络出版日期:2016-06,
纸质出版日期:2016-06-25
移动端阅览
朱斐, 许志鹏, 刘全, 等. 基于可中断Option的在线分层强化学习方法[J]. 通信学报, 2016,37(6):65-74.
Fei ZHU, Zhi-peng XU, Quan LIU, et al. Online hierarchical reinforcement learning based on interrupting Option[J]. Journal on communications, 2016, 37(6): 65-74.
朱斐, 许志鹏, 刘全, 等. 基于可中断Option的在线分层强化学习方法[J]. 通信学报, 2016,37(6):65-74. DOI: 10.11959/j.issn.1000-436x.2016117.
Fei ZHU, Zhi-peng XU, Quan LIU, et al. Online hierarchical reinforcement learning based on interrupting Option[J]. Journal on communications, 2016, 37(6): 65-74. DOI: 10.11959/j.issn.1000-436x.2016117.
针对大数据体量大的问题,在Macro-Q算法的基础上提出了一种在线更新的Macro-Q算法(MQIU),同时更新抽象动作的值函数和元动作的值函数,提高了数据样本的利用率。针对传统的马尔可夫过程模型和抽象动作均难于应对可变性,引入中断机制,提出了一种可中断抽象动作的Macro-Q无模型学习算法(IMQ),能在动态环境下学习并改进控制策略。仿真结果验证了MQIU算法能加快算法收敛速度,进而能解决更大规模的问题,同时也验证了IMQ算法能够加快任务的求解,并保持学习性能的稳定性。
Aiming at dealing with volume of big data
an on-line updating algorithm
named by Macro-Q with in-place updating (MQIU)
which was based on Macro-Q algorithm and takes advantage of in-place updating approach
was proposed.The MQIU algorithm updates both the value function of abstract action and the value function of primitive action
and hence speeds up the convergence rate.By introducing the interruption mechanism
a model-free interrupting Macro-Q Option learning algorithm(IMQ)
which was based on hierarchical reinforcement learning
was also introduced to order to handle the variability which was hard to process by the conventional Markov decision process model and abstract action so that IMQ was able to learn and improve control strategies in a dynamic environment.Simulations verify the MQIU algorithm speeds up the convergence rate so that it is able to do with the larger scale of data
and the IMQ algorithm solves the task faster with a stable learning performance.
OTTERLO M V , WIERING M . Reinforcement learning and Markov decision processes [J ] . Adaptation Learning &Optimization , 2012 , 206 ( 4 ): 3 - 42 .
VAN H H . Reinforcement learning:state of the art [M ] . Berlin : SpringerPress , 2007 .
沈晶 , 顾国昌 , 刘海波 . 未知动态环境中基于分层强化学习的移动机器人路径规划 [J ] . 机器人 , 2006 , 28 ( 5 ): 544 - 547 .
SHEN J , GU G C , LIU H B . Mobile robot path planning based on hierarchical reinforcement learning in unknown dynamic environment [J ] . ROBOT , 2006 , 28 ( 5 ): 544 - 547 .
刘全 , 闫其粹 , 伏玉琛 , 等 . 一种基于启发式奖赏函数的分层强化学习方法 [J ] . 计算机研究与发展 , 2011 , 48 ( 12 ): 2352 - 2358 .
LIU Q , YAN Q C , FU Y C , et al . A hierarchical reinforcement learning method based on heuristic reward function [J ] . Journal of Computer Research and Development , 2011 , 48 ( 12 ): 2352 - 2358 .
陈兴国 , 高阳 , 范顺国 , 等 . 基于核方法的连续动作Actor-Critic学习 [J ] . 模式识别与人工智能 , 2014 ( 2 ): 103 - 110 .
CHEN X G , GAO Y , FAN S G , et al . Kernel-based continuous-action actor-critic learning [J ] . Pattern Recognition and Artificial Intelligence , 2014 ( 2 ): 103 - 110 .
朱斐 , 刘全 , 傅启明 , 等 . 一种用于连续动作空间的最小二乘行动者-评论家方法 [J ] . 计算机研究与发展 , 2014 , 51 ( 3 ): 548 - 558 .
ZHU F , LIU Q , FU Q M , et al . A least square actor-critic approach for continuous action space [J ] . Journal of Computer Research and Development , 2014 , 51 ( 3 ): 548 - 558 .
唐昊 , 张晓艳 , 韩江洪 , 等 . 基于连续时间半马尔可夫决策过程的Option算法 [J ] . 计算机学报 , 2014 ( 9 ): 2027 - 2037 .
TANG H , ZHANG X Y , HAN J H , et al . Option algorithm based on continuous-time semi-Markov decision process [J ] . Chinese Journal of Computers , 2014 ( 9 ): 2027 - 2037 .
SUTTON R S , PRECUP D , SINGH S . Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning [J ] . Artificial Intelligence , 1999 , 112 ( 1 ): 181 - 211 .
MCGOVERN A , BARTO A G . Automatic discovery of subgoals in reinforcement learning using diverse density [J ] . Computer Science Department Faculty Publication Series , 2001 ( 8 ): 361 - 368 .
ŞIMŞEK Ö , WOLFE A P , BARTO A G . Identifying useful subgoals in reinforcement learning by local graph partitioning [C ] // The 22nd International Conference on Machine Learning . ACM , 2005 : 816 - 823 .
ŞIMŞEK Ö , BARTO A G , . Using relative novelty to identify useful temporal abstractions in reinforcement learning [C ] // The Twenty-first International Conference on Machine Learning . ACM , 2004 : 751 - 758 .
CHAGANTY A T , GAUR P , RAVINDRAN B . Learning in a small world [C ] // The 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1.International Foundation for Autonomous Agents and Multiagent Systems . 2012 : 391 - 397 .
SUTTON R S , SINGH S , PRECUP D , et al . Improved switching among temporally abstract actions [J ] . Advances in Neural Information Processing Systems , 1999 : 1066 - 1072 .
CASTRO P S , PRECUP D . Automatic construction of temporally extended actions for mdps using bisimulation metrics [C ] // European Conference on Recent Advances in Reinforcement Learning . Springer-Verlag , 2011 : 140 - 152 .
何清 , 李宁 , 罗文娟 , 等 . 大数据下的机器学习算法综述 [J ] . 模式识别与人工智能 , 2014 , 27 ( 4 ): 327 - 336 .
HE Q , LI N , LUO W J , et al . A survey of machine learning algorithms for big data [J ] . Pattern Recognition and Artificial Intelligence , 2014 , 27 ( 4 ): 327 - 336 .
SUTTON R S , PRECUP D , SINGH S P . Intra-option learning about temporally abstract actions [C ] // ICML . 1998 , 98 : 556 - 564 .
石川 , 史忠植 , 王茂光 . 基于路径匹配的在线分层强化学习方法 [J ] . 计算机研究与发展 , 2008 , 45 ( 9 ): 1470 - 1476 .
SHI C , SHI Z Z , WANG M G . Online hierarchical reinforcement learning based on path-matching [J ] . Journal of Computer Research and Development , 2008 , 45 ( 9 ): 1470 - 1476 .
BOTVINICK M M . Hierarchical reinforcement learning and decision making [J ] . Current Opinion in Neurobiology , 2012 , 22 ( 6 ): 956 - 962 .
王爱平 , 万国伟 , 程志全 , 等 . 支持在线学习的增量式极端随机森林分类器 [J ] . 软件学报 , 2011 , 22 ( 9 ): 2059 - 2074 .
WANG A P , WAN G W , CHENG Z Q , et al . Incremental learning extremely random forest classifier for online learning [J ] . Journal of Software , 2011 , 22 ( 9 ): 2059 - 2074 .
0
浏览量
778
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构