Optimized algorithm for value iteration based on topological sequence backups

Wei HUANG; Quan LIU; Hong-kun SUN; Qi-ming FU; HOUXiao-ke Z

doi:10.3969/j.issn.1000-436x.2014.08.008

您当前的位置：

首页 >

文章列表页 >

Optimized algorithm for value iteration based on topological sequence backups

Academic paper | 更新时间：2024-10-11

- Optimized algorithm for value iteration based on topological sequence backups
- Journal on Communications Vol. 35, Issue 8, Pages: 56-62(2014)
- 作者机构：
  
  1. 苏州大学计算机科学与技术学院，江苏苏州 215006
  2. 吉林大学符号计算与知识工程教育部重点实验室，吉林长春 130012
- 作者简介：
- 基金信息：
- DOI：10.3969/j.issn.1000-436x.2014.08.008
  CLC： TP181
- Online First：2014-08，
  
  Published：25 August 2014
- 稿件说明：
移动端阅览
Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on Communications, 2014, 35(8): 56-62.
DOI：

Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on Communications, 2014, 35(8): 56-62. DOI： 10.3969/j.issn.1000-436x.2014.08.008.

摘要

提出一种基于拓扑序列更新的值迭代算法，利用状态之间的迁移关联信息，将任务模型的有向图分解为一系列规模较小的强连通分量，并依据拓扑序列对强连通分量进行更新。在经典规划问题Mountain Car和迷宫实验中的结果表明，算法的收敛速度更快，精度更高，且对状态空间的增长有较强的顽健性。

Abstract

In order to improve the convergence performance

an optimized value iteration based on topological sequence backups

VI-TS

is proposed. The key idea of VI-TS is to circumvent the problem of unnecessary backups by dividing an MDP into strongly-connected components and solving these components in topological sequences after detecting the structure of MDP. The experiment results show that VI-TS has a better convergence performance and robustness for state space growth when applied to classical planning experiment scenarios.

关键词

Keywords

references

刘全，傅启明，龚声蓉等 . 最小状态变元平均奖赏的强化学习方法 [J ] . 通信学报 2011 , 32 ( 1 ): 66 - 71 .

LIU Q , FU Q M , GONG S R , et al . Reinforcement learning algorithm based on minimum state method and average reward [J ] . Journal on Communications , 2011 , 32 ( 1 ): 66 - 71 .

SZEPESVARI C . Algorithms for Reinforcement Learning [M ] . San Rafael: Morgan Claypool , 2010 .

SUTTON R S , BARTO A G . Reinforcement Learning: An Introduc-tion [M ] . Cambridge : MIT Press , 1998 .

HOWARD R . Dynamic Programming and Markov Processes [M ] . Cambridge, MA : MIT Press , 1960 .

BERTSEKAS D P . Dynamic Programming and Optimal Control [M ] . Belmont, MA: Athena Scientific , 2000 .

POWELL W B . Approximate Dynamic Programming: Solving the Curses of Dimensionality [M ] . New York: John Wiley＆Sons , 2007 .

HANSEN E , ZILBERSTEIN S . Lao*: a heuristic search algorithm that finds solutions with loops[ [J ] . Artificial Intelligence , 2001 , 129 ( 1/2 ): 35 - 62 .

BONET B , GEFFNER H . Labeled RTDP: Improving the convergence of real-time dynamic programming [A ] . Proc of 13th ICAPS [C ] . Trento, Italy 2003 . 12 - 21 .

BONET B , GEFFNER H . Faster heuristic search algorithms for plan-ning with uncertainty and full feedback [A ] . International Joint Con-ference on Artificial Intelligence [C ] . 2003 . 1233 - 1238 .

MOORE A W , ATKESON C G . Prioritized sweeping: reinforcement learning with less data and less time [J ] . Machine Learning , 1993 , 13 ( 1 ): 103 - 130 .

ANDRE D , FRIEDMAN N , PARR R . Generalized prioritized sweep-ing [A ] . Proc of the 10th Conference on Advances in Neural Informa-tion Processing Systems [C ] . Cambridge , 1997 . 1001 - 1007 .

CORMEN T H , LEISERSON C E , RIVEST R L , et al . Introduction to Algorithms [M ] . Cambridge, MA : MIT Press , 2001 .

Views

415

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

RL-WGAN based method for 5G network anomalous data generation

Study on Co-EDCA mechanism for multi-AP collaboration in FTTR C-WAN architecture

Design and implementation of an IPv6+ based intelligent computing-network scheduling scheme for Internet of vehicles

Survey of node localization scheme in underwater wireless sensor network

Adaptive defense model for critical assets against unknown network threats

Related Author

Ning Zhaolong

Zou Daoyuan

Zhou Li

Ouyang Ruiqi

Xiong Xuanrui

WU Weimin

ZENG Chen

YU Zhaoyang

Related Institution

College of Electronic Science and Technology, National University of Defense Technology

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications

School of Electronic Information and Communication, Huazhong University of Science and Technology

Intelligent and Connected Vehicle Research Institute, China Unicom Smart Connection Technology Limited

School of Computer Science, Beijing University of Posts and Telecommunications

AI问答

⁰