基于拓扑序列更新的值迭代算法

黄蔚; 刘全; 孙洪坤; 傅启明; 周小科

doi:10.3969/j.issn.1000-436x.2014.08.008

您当前的位置：

首页 >

文章列表页 >

基于拓扑序列更新的值迭代算法

学术论文 | 更新时间：2024-10-11

- 基于拓扑序列更新的值迭代算法
- Optimized algorithm for value iteration based on topological sequence backups
- 通信学报 2014年35卷第8期页码：56-62
- 作者机构：
  
  1. 苏州大学计算机科学与技术学院，江苏苏州 215006
  2. 吉林大学符号计算与知识工程教育部重点实验室，吉林长春 130012
- 作者简介：
  
  [ "黄蔚（1970-），女，江苏海门人，苏州大学讲师，主要研究方向为机器学习。" ]
  [ "刘全（1969-），男，内蒙古牙克石人，苏州大学教授、博士生导师，主要研究方向为强化学习、智能信息处理和自动推理。" ]
  [ "孙洪坤（1988-），男，江苏淮安人，苏州大学硕士生，主要研究方向为强化学习。" ]
  [ "傅启明（1985-），男，江苏淮安人，苏州大学博士生，主要研究方向为强化学习、贝叶斯推理和遗传算法。" ]
  [ "周小科（1976-），男，江西上饶人，苏州大学讲师，主要研究方向为机器学习。" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61070223);国家自然科学基金资助项目;The National Natural Science Foundation of China(61103045);国家自然科学基金资助项目;The National Natural Science Foundation of China(61272005);国家自然科学基金资助项目;The National Natural Science Foundation of China(61170020);江苏省自然科学基金资助项目;The Natural Science Foundation of Jiangsu Province(BK2012616);江苏省高校自然科学研究基金资助项目;The High School Natural Foundation of Jiangsu Province(09KJA520002);江苏省高校自然科学研究基金资助项目;The High School Natural Foundation of Jiangsu Province(09KJB520012);吉林大学符号计算与知识工程教育部重点实验室基金资助项目;The Foundation of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172012K04)
- DOI：10.3969/j.issn.1000-436x.2014.08.008
  中图分类号： TP181
- 网络出版日期：2014-08，
  
  纸质出版日期：2014-08-25
- 稿件说明：
移动端阅览
黄蔚, 刘全, 孙洪坤, 等. 基于拓扑序列更新的值迭代算法[J]. 通信学报, 2014,35(8):56-62.

Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on communications, 2014, 35(8): 56-62.
黄蔚, 刘全, 孙洪坤, 等. 基于拓扑序列更新的值迭代算法[J]. 通信学报, 2014,35(8):56-62. DOI： 10.3969/j.issn.1000-436x.2014.08.008.

Wei HUANG, Quan LIU, Hong-kun SUN, et al. Optimized algorithm for value iteration based on topological sequence backups[J]. Journal on communications, 2014, 35(8): 56-62. DOI： 10.3969/j.issn.1000-436x.2014.08.008.

摘要

提出一种基于拓扑序列更新的值迭代算法，利用状态之间的迁移关联信息，将任务模型的有向图分解为一系列规模较小的强连通分量，并依据拓扑序列对强连通分量进行更新。在经典规划问题Mountain Car和迷宫实验中的结果表明，算法的收敛速度更快，精度更高，且对状态空间的增长有较强的顽健性。

Abstract

In order to improve the convergence performance

an optimized value iteration based on topological sequence backups

VI-TS

is proposed. The key idea of VI-TS is to circumvent the problem of unnecessary backups by dividing an MDP into strongly-connected components and solving these components in topological sequences after detecting the structure of MDP. The experiment results show that VI-TS has a better convergence performance and robustness for state space growth when applied to classical planning experiment scenarios.

关键词

Keywords

references

刘全，傅启明，龚声蓉等 . 最小状态变元平均奖赏的强化学习方法 [J ] . 通信学报 2011 , 32 ( 1 ): 66 - 71 .

LIU Q , FU Q M , GONG S R , et al . Reinforcement learning algorithm based on minimum state method and average reward [J ] . Journal on Communications , 2011 , 32 ( 1 ): 66 - 71 .

SZEPESVARI C . Algorithms for Reinforcement Learning [M ] . San Rafael: Morgan Claypool , 2010 .

SUTTON R S , BARTO A G . Reinforcement Learning: An Introduc-tion [M ] . Cambridge : MIT Press , 1998 .

HOWARD R . Dynamic Programming and Markov Processes [M ] . Cambridge, MA : MIT Press , 1960 .

BERTSEKAS D P . Dynamic Programming and Optimal Control [M ] . Belmont, MA: Athena Scientific , 2000 .

POWELL W B . Approximate Dynamic Programming: Solving the Curses of Dimensionality [M ] . New York: John Wiley＆Sons , 2007 .

HANSEN E , ZILBERSTEIN S . Lao*: a heuristic search algorithm that finds solutions with loops[ [J ] . Artificial Intelligence , 2001 , 129 ( 1/2 ): 35 - 62 .

BONET B , GEFFNER H . Labeled RTDP: Improving the convergence of real-time dynamic programming [A ] . Proc of 13th ICAPS [C ] . Trento, Italy 2003 . 12 - 21 .

BONET B , GEFFNER H . Faster heuristic search algorithms for plan-ning with uncertainty and full feedback [A ] . International Joint Con-ference on Artificial Intelligence [C ] . 2003 . 1233 - 1238 .

MOORE A W , ATKESON C G . Prioritized sweeping: reinforcement learning with less data and less time [J ] . Machine Learning , 1993 , 13 ( 1 ): 103 - 130 .

ANDRE D , FRIEDMAN N , PARR R . Generalized prioritized sweep-ing [A ] . Proc of the 10th Conference on Advances in Neural Informa-tion Processing Systems [C ] . Cambridge , 1997 . 1001 - 1007 .

CORMEN T H , LEISERSON C E , RIVEST R L , et al . Introduction to Algorithms [M ] . Cambridge, MA : MIT Press , 2001 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于软提示微调和强化学习的网络安全命名实体识别方法研究

基于审计博弈的安全协作频谱感知方案

基于强化学习的在线离线混部云环境下的调度框架

基于深度强化学习的微服务多维动态防御策略研究

面向智能渗透攻击的欺骗防御方法