最小状态变元平均奖赏的强化学习方法

刘全; 傅启明; 龚声蓉; 伏玉琛; 崔志明

您当前的位置：

首页 >

文章列表页 >

最小状态变元平均奖赏的强化学习方法

学术论文 | 更新时间：2024-10-14

- 最小状态变元平均奖赏的强化学习方法
- Reinforcement learning algorithm based on minimum state method and average reward
- 通信学报 2011年32卷第1期页码：66-71
- 作者机构：
  
  1. 苏州大学计算机科学与技术学院
  2. 南京大学软件新技术国家重点实验室
- 作者简介：
- 基金信息：
  
  国家自然科学基金资助项目(60873116,61070223,61070122);江苏省自然科学基金资助项目(BK2008161,BK2009116);江苏省高校自然科学研究基金资助项目(09KJA520002)
- DOI：
  中图分类号： TP181
- 纸质出版日期：2011
- 稿件说明：
移动端阅览
刘全, 傅启明, 龚声蓉, 等. 最小状态变元平均奖赏的强化学习方法[J]. 通信学报, 2011,32(1):66-71.

LIU Quan1, FU Qi-ming1, GONG Sheng-rong1, et al. Reinforcement learning algorithm based on minimum state method and average reward[J]. 2011, 32(1): 66-71.
刘全, 傅启明, 龚声蓉, 等. 最小状态变元平均奖赏的强化学习方法[J]. 通信学报, 2011,32(1):66-71. DOI：

LIU Quan1, FU Qi-ming1, GONG Sheng-rong1, et al. Reinforcement learning algorithm based on minimum state method and average reward[J]. 2011, 32(1): 66-71. DOI：

摘要

针对采用折扣奖赏作为评价目标的Q学习无法体现对后续动作的影响问题

提出将平均奖赏和Q学习相结合的AR-Q-Learning算法

并进行收敛性证明。针对学习参数个数随着状态变量维数呈几何级增长的"维数灾"问题

提出最小状态变元的思想。将最小变元思想和平均奖赏用于积木世界的强化学习中

试验结果表明

该方法更具有后效性

加快算法的收敛速度

同时在一定程度上解决积木世界中的"维数灾"问题。

Abstract

In allusion to the problem that Q-Learning

which was used discount reward as the evaluation criterion

could not show the affect of the action to the next situation

AR-Q-Learning was put forward based on the average reward and Q-Learning.In allusion to the curse of dimensionality

which meant that the computational requirement grew exponen-tially with the number of the state variable.Minimum state method was put forward.AR-Q-Learning and minimum state method were used in reinforcement learning for Blocks World

and the result of the experiment shows that the method has the characteristic of aftereffect and converges more faster than Q-Learning

and at the same time

solve the curse of di-mensionality in a certain extent in Blocks World.

关键词

Keywords

references

浏览量

232

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于软提示微调和强化学习的网络安全命名实体识别方法研究

基于审计博弈的安全协作频谱感知方案

基于强化学习的在线离线混部云环境下的调度框架

基于深度强化学习的微服务多维动态防御策略研究

面向智能渗透攻击的欺骗防御方法