浏览全部资源
扫码关注微信
1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
2. 吉林大学 符号计算与知识工程教育部重点实验室,吉林 长春 130012
[ "黄蔚(1970-),女,江苏海门人,苏州大学讲师,主要研究方向为机器学习。" ]
[ "刘全(1969-),男,内蒙古牙克石人,苏州大学教授、博士生导师,主要研究方向为强化学习、智能信息处理和自动推理。" ]
[ "孙洪坤(1988-),男,江苏淮安人,苏州大学硕士生,主要研究方向为强化学习。" ]
[ "傅启明(1985-),男,江苏淮安人,苏州大学博士生,主要研究方向为强化学习、贝叶斯推理和遗传算法。" ]
[ "周小科(1976-),男,江西上饶人,苏州大学讲师,主要研究方向为机器学习。" ]
网络出版日期:2014-08,
纸质出版日期:2014-08-25
移动端阅览
黄蔚, 刘全, 孙洪坤, 等. 基于拓扑序列更新的值迭代算法[J]. 通信学报, 2014,35(8):56-62.
Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on communications, 2014, 35(8): 56-62.
黄蔚, 刘全, 孙洪坤, 等. 基于拓扑序列更新的值迭代算法[J]. 通信学报, 2014,35(8):56-62. DOI: 10.3969/j.issn.1000-436x.2014.08.008.
Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on communications, 2014, 35(8): 56-62. DOI: 10.3969/j.issn.1000-436x.2014.08.008.
提出一种基于拓扑序列更新的值迭代算法,利用状态之间的迁移关联信息,将任务模型的有向图分解为一系列规模较小的强连通分量,并依据拓扑序列对强连通分量进行更新。在经典规划问题Mountain Car和迷宫实验中的结果表明,算法的收敛速度更快,精度更高,且对状态空间的增长有较强的顽健性。
In order to improve the convergence performance
an optimized value iteration based on topological sequence backups
VI-TS
is proposed. The key idea of VI-TS is to circumvent the problem of unnecessary backups by dividing an MDP into strongly-connected components and solving these components in topological sequences after detecting the structure of MDP. The experiment results show that VI-TS has a better convergence performance and robustness for state space growth when applied to classical planning experiment scenarios.
刘全 , 傅启明 , 龚声蓉 等 . 最小状态变元平均奖赏的强化学习方法 [J ] . 通信学报 2011 , 32 ( 1 ): 66 - 71 .
LIU Q , FU Q M , GONG S R , et al . Reinforcement learning algorithm based on minimum state method and average reward [J ] . Journal on Communications , 2011 , 32 ( 1 ): 66 - 71 .
SZEPESVARI C . Algorithms for Reinforcement Learning [M ] . San Rafael: Morgan Claypool , 2010 .
SUTTON R S , BARTO A G . Reinforcement Learning: An Introduc-tion [M ] . Cambridge : MIT Press , 1998 .
HOWARD R . Dynamic Programming and Markov Processes [M ] . Cambridge, MA : MIT Press , 1960 .
BERTSEKAS D P . Dynamic Programming and Optimal Control [M ] . Belmont, MA: Athena Scientific , 2000 .
POWELL W B . Approximate Dynamic Programming: Solving the Curses of Dimensionality [M ] . New York: John Wiley&Sons , 2007 .
HANSEN E , ZILBERSTEIN S . Lao*: a heuristic search algorithm that finds solutions with loops[ [J ] . Artificial Intelligence , 2001 , 129 ( 1/2 ): 35 - 62 .
BONET B , GEFFNER H . Labeled RTDP: Improving the convergence of real-time dynamic programming [A ] . Proc of 13th ICAPS [C ] . Trento, Italy 2003 . 12 - 21 .
BONET B , GEFFNER H . Faster heuristic search algorithms for plan-ning with uncertainty and full feedback [A ] . International Joint Con-ference on Artificial Intelligence [C ] . 2003 . 1233 - 1238 .
MOORE A W , ATKESON C G . Prioritized sweeping: reinforcement learning with less data and less time [J ] . Machine Learning , 1993 , 13 ( 1 ): 103 - 130 .
ANDRE D , FRIEDMAN N , PARR R . Generalized prioritized sweep-ing [A ] . Proc of the 10th Conference on Advances in Neural Informa-tion Processing Systems [C ] . Cambridge , 1997 . 1001 - 1007 .
CORMEN T H , LEISERSON C E , RIVEST R L , et al . Introduction to Algorithms [M ] . Cambridge, MA : MIT Press , 2001 .
0
浏览量
0
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构