Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on Communications, 2014, 35(8): 56-62.
DOI:
Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on Communications, 2014, 35(8): 56-62. DOI: 10.3969/j.issn.1000-436x.2014.08.008.
Optimized algorithm for value iteration based on topological sequence backups
an optimized value iteration based on topological sequence backups
VI-TS
is proposed. The key idea of VI-TS is to circumvent the problem of unnecessary backups by dividing an MDP into strongly-connected components and solving these components in topological sequences after detecting the structure of MDP. The experiment results show that VI-TS has a better convergence performance and robustness for state space growth when applied to classical planning experiment scenarios.
LIU Q , FU Q M , GONG S R , et al . Reinforcement learning algorithm based on minimum state method and average reward [J ] . Journal on Communications , 2011 , 32 ( 1 ): 66 - 71 .
SZEPESVARI C . Algorithms for Reinforcement Learning [M ] . San Rafael: Morgan Claypool , 2010 .
SUTTON R S , BARTO A G . Reinforcement Learning: An Introduc-tion [M ] . Cambridge : MIT Press , 1998 .
HOWARD R . Dynamic Programming and Markov Processes [M ] . Cambridge, MA : MIT Press , 1960 .
BERTSEKAS D P . Dynamic Programming and Optimal Control [M ] . Belmont, MA: Athena Scientific , 2000 .
POWELL W B . Approximate Dynamic Programming: Solving the Curses of Dimensionality [M ] . New York: John Wiley&Sons , 2007 .
HANSEN E , ZILBERSTEIN S . Lao*: a heuristic search algorithm that finds solutions with loops[ [J ] . Artificial Intelligence , 2001 , 129 ( 1/2 ): 35 - 62 .
BONET B , GEFFNER H . Labeled RTDP: Improving the convergence of real-time dynamic programming [A ] . Proc of 13th ICAPS [C ] . Trento, Italy 2003 . 12 - 21 .
BONET B , GEFFNER H . Faster heuristic search algorithms for plan-ning with uncertainty and full feedback [A ] . International Joint Con-ference on Artificial Intelligence [C ] . 2003 . 1233 - 1238 .
MOORE A W , ATKESON C G . Prioritized sweeping: reinforcement learning with less data and less time [J ] . Machine Learning , 1993 , 13 ( 1 ): 103 - 130 .
ANDRE D , FRIEDMAN N , PARR R . Generalized prioritized sweep-ing [A ] . Proc of the 10th Conference on Advances in Neural Informa-tion Processing Systems [C ] . Cambridge , 1997 . 1001 - 1007 .
CORMEN T H , LEISERSON C E , RIVEST R L , et al . Introduction to Algorithms [M ] . Cambridge, MA : MIT Press , 2001 .