基于值函数迁移的启发式Sarsa算法

陈建平; 杨正霞; 刘全; 吴宏杰; 徐杨; 傅启明

doi:10.11959/j.issn.1000-436x.2018133

您当前的位置：

首页 >

文章列表页 >

基于值函数迁移的启发式Sarsa算法

论文Ⅰ：人工智能与网络安全 | 更新时间：2024-06-05

- 基于值函数迁移的启发式Sarsa算法
- Heuristic Sarsa algorithm based on value function transfer
- 通信学报 2018年39卷第8期页码：37-47
- 作者机构：
  
  1. 苏州科技大学电子与信息工程学院，江苏苏州 215009
  2. 苏州科技大学江苏省建筑智慧节能重点实验室，江苏苏州 215009
  3. 苏州科技大学苏州市移动网络技术与应用重点实验室，江苏苏州 215009
  4. 苏州大学计算机科学与技术学院，江苏苏州 215000
  5. 浙江纺织服装职业技术学院信息工程学院，浙江宁波 315000
- 作者简介：
  
  [ "陈建平（1963-），男，江苏南京人，博士，苏州科技大学教授，主要研究方向为大数据分析与应用、建筑节能、智能信息处理。" ]
  [ "杨正霞（1992-），女，江苏扬州人，苏州科技大学硕士生，主要研究方向为强化学习、迁移学习、建筑节能。" ]
  [ "刘全（1969-），男，内蒙古牙克石人，博士，苏州大学教授、博士生导师，主要研究方向为智能信息处理、自动推理与机器学习。" ]
  [ "吴宏杰（1977-），男，江苏苏州人，博士，苏州科技大学副教授，主要研究方向为深度学习、模式识别、生物信息。" ]
  [ "徐杨（1980-），女，河北深州人，浙江纺织服装职业技术学院讲师，主要研究方向为数据分析与应用、智能化与个性化教学。" ]
  [ "傅启明（1985-），男，江苏淮安人，博士，苏州科技大学讲师，主要研究方向为强化学习、深度学习及建筑节能。" ]
- 基金信息：
  
  国家自然科学基金资助项目(61502329);国家自然科学基金资助项目(61772357);国家自然科学基金资助项目(61750110519);国家自然科学基金资助项目(61772355);国家自然科学基金资助项目(61702055);国家自然科学基金资助项目(61672371);国家自然科学基金资助项目(61602334);江苏省自然科学基金资助项目(BK20140283);江苏省重点研发计划基金资助项目(BE2017663);江苏省高校自然科学研究基金资助项目(13KJB520020);苏州市应用基础研究计划基金资助项目(SYG201422)
- DOI：10.11959/j.issn.1000-436x.2018133
  中图分类号： TP391
- 网络出版日期：2018-08，
  
  纸质出版日期：2018-08-25
- 稿件说明：
移动端阅览
陈建平, 杨正霞, 刘全, 等. 基于值函数迁移的启发式Sarsa算法[J]. 通信学报, 2018,39(8):37-47.

Jianping CHEN, Zhengxia YANG, Quan LIU, et al. Heuristic Sarsa algorithm based on value function transfer[J]. Journal on communications, 2018, 39(8): 37-47.
陈建平, 杨正霞, 刘全, 等. 基于值函数迁移的启发式Sarsa算法[J]. 通信学报, 2018,39(8):37-47. DOI： 10.11959/j.issn.1000-436x.2018133.

Jianping CHEN, Zhengxia YANG, Quan LIU, et al. Heuristic Sarsa algorithm based on value function transfer[J]. Journal on communications, 2018, 39(8): 37-47. DOI： 10.11959/j.issn.1000-436x.2018133.

摘要

针对 Sarsa 算法存在的收敛速度较慢的问题，提出一种改进的基于值函数迁移的启发式 Sarsa 算法（VFT-HSA）。该算法将Sarsa算法与值函数迁移方法相结合，引入自模拟度量方法，在相同的状态空间和动作空间下，对新任务与历史任务之间的不同状态进行相似性度量，对满足条件的历史状态进行值函数迁移，提高算法的收敛速度。此外，该算法结合启发式探索方法，引入贝叶斯推理，结合变分推理衡量信息增益，并运用获取的信息增益构建内在奖赏函数作为探索因子，进而加快算法的收敛速度。将所提算法用于经典的Grid World问题，并与Sarsa算法、Q-Learning算法以及收敛性能较好的VFT-Sarsa算法、IGP-Sarsa算法进行比较，实验表明，所提算法具有较快的收敛速度和较好的稳定性。

Abstract

With the problem of slow convergence for traditional Sarsa algorithm

an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method

and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition

combined with heuristic exploration method

the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally

using the obtained information gain to build intrinsic reward function model as exploring factors

to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem

and compared with the traditional Sarsa algorithm

the Q-Learning algorithm

and the VFT-Sarsa algorithm

the IGP-Sarsa algorithm with better convergence performance

the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.

关键词

Keywords

references

SUTTON R S , BARTO G A . Reinforcement learning:an introduction [M ] . Cambridge : MIT PressPress , 1998 .

SCHMIDHUBER J , INFORMATIK T T . On learning how to learn learning strategies [R ] . Germany:Technische University , 1995 .

AMMAR H B , EATON E , LUNA J M , et al . Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning [C ] // The 15th International Conference on Artificial Intelligence . 2015 : 3345 - 3351 .

GUPTA A , DEVIN C , LIU Y X , et al . Learning invariant feature spaces to transfer skills with reinforcement learning [C ] // The 5th International Conference on Learning Representations . 2017 : 2147 - 2153 .

LAROCHE R , BARLIER M . Transfer reinforcement learning with shared dynamics [C ] // The 31th International Conference on the Association for the Advance of Artificial Intelligence . 2017 : 2147 - 2153 .

BARRETO A , DABNEY W , MUNOS R , et al . Successor features for transfer in reinforcement learning [C ] // The 32th International Conference on Neural Information Processing Systems . 2017 : 4055 - 4065 .

DEARDEN R , NIR F , STUART R . Bayesian Q-learning [C ] // The 21th International Conference on the Association for the Advance of Artificial Intelligence . 1998 : 761 - 768 .

GUEZ A , SILVER D , DAYAN P . Scalable and efficient Bayes- adaptive reinforcement learning based on Monte-Carlo tree search [J ] . Journal of Artificial Intelligence Research , 2013 , 48 ( 1 ): 841 - 883 .

LITTLE D Y , SOMMER F T . Learning and exploration in action-perception loops [J ] . Frontiers in Neural Circuits , 2013 , 7 ( 7 ): 37 - 56 .

MANSOUR Y , SLIVKINS A , SYRGKANIS V . Bayesian incentive-compatible bandit exploration [C ] // The 16th International Conference on Economics and Computation . 2015 : 565 - 582 .

VIEN N A , LEE S G , CHUNG T C . Bayes-adaptive hierarchical MDPs [J ] . Applied Intelligence , 2016 , 45 ( 1 ): 112 - 126 .

WU B , FENG Y . Monte-Carlo Bayesian reinforcement learning using a compact factored representation [C ] // The 4th International Conference on Information Science and Control Engineering . 2017 : 466 - 469 .

傅启明 , 刘全 , 伏玉琛 , 等 . 一种高斯过程的带参近似策略迭代算法 [J ] . 软件学报 , 2013 , 24 ( 11 ): 2676 - 2687 .

FU Q M , LIU Q , FU Y C , et al . Parametric approximation policy strategy iteration algorithm based on Gaussian process [J ] . Journal of Software , 2013 , 24 ( 11 ): 2676 - 2687 .

GIVAN R , DEAN T , GREIG M . Equivalence notions and model minimization in Markov decision processes [J ] . Artificial Intelligence , 2003 , 147 ( 1 ): 163 - 223 .

FERNS N , PANANGADEN P , PRECUP D . Metrics for finite Markov decision processes [C ] // The 20th International Conference on Uncertainty in Artificial Intelligence . 2004 : 162 - 169 .

BEAL M J . Variational algorithms for approximate Bayesian inference [D ] . London:University of London , 2003 .

傅启明 , 刘全 , 尤树华 , 等 . 一种新的基于值函数迁移的快速Sarsa算法 [J ] . 电子学报 , 2014 , 42 ( 11 ): 2157 - 2161 .

FU Q M , LIU Q , YOU S H , et al . A novel fast sarsa algorithm based on value function transfer [J ] . Acta Electronica Sinica , 2014 , 42 ( 11 ): 2157 - 2161 .

MIERING M , HASSELT H V . The QV family compared to other reinforcement learning algorithms [C ] // The 17th International Conference on Approximate Dynamic Programming and Reinforcement Learning . 2008 : 101 - 108 .

CHUNG J J , LAWRANCE N R J , SUKKARIEH S . Gaussian processes for informative exploration in reinforcement learning [C ] // The 20th International Conference on Robotics and Automation . 2013 : 2633 - 2639 .

浏览量

1153

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于软提示微调和强化学习的网络安全命名实体识别方法研究

基于审计博弈的安全协作频谱感知方案

基于强化学习的在线离线混部云环境下的调度框架

基于深度强化学习的微服务多维动态防御策略研究

面向智能渗透攻击的欺骗防御方法