浏览全部资源
扫码关注微信
1. 苏州科技大学电子与信息工程学院,江苏 苏州 215009
2. 苏州科技大学江苏省建筑智慧节能重点实验室,江苏 苏州 215009
3. 苏州科技大学苏州市移动网络技术与应用重点实验室,江苏 苏州 215009
4. 苏州大学计算机科学与技术学院,江苏 苏州 215000
5. 浙江纺织服装职业技术学院信息工程学院,浙江 宁波 315000
[ "陈建平(1963-),男,江苏南京人,博士,苏州科技大学教授,主要研究方向为大数据分析与应用、建筑节能、智能信息处理。" ]
[ "杨正霞(1992-),女,江苏扬州人,苏州科技大学硕士生,主要研究方向为强化学习、迁移学习、建筑节能。" ]
[ "刘全(1969-),男,内蒙古牙克石人,博士,苏州大学教授、博士生导师,主要研究方向为智能信息处理、自动推理与机器学习。" ]
[ "吴宏杰(1977-),男,江苏苏州人,博士,苏州科技大学副教授,主要研究方向为深度学习、模式识别、生物信息。" ]
[ "徐杨(1980-),女,河北深州人,浙江纺织服装职业技术学院讲师,主要研究方向为数据分析与应用、智能化与个性化教学。" ]
[ "傅启明(1985-),男,江苏淮安人,博士,苏州科技大学讲师,主要研究方向为强化学习、深度学习及建筑节能。" ]
网络出版日期:2018-08,
纸质出版日期:2018-08-25
移动端阅览
陈建平, 杨正霞, 刘全, 等. 基于值函数迁移的启发式Sarsa算法[J]. 通信学报, 2018,39(8):37-47.
Jianping CHEN, Zhengxia YANG, Quan LIU, et al. Heuristic Sarsa algorithm based on value function transfer[J]. Journal on communications, 2018, 39(8): 37-47.
陈建平, 杨正霞, 刘全, 等. 基于值函数迁移的启发式Sarsa算法[J]. 通信学报, 2018,39(8):37-47. DOI: 10.11959/j.issn.1000-436x.2018133.
Jianping CHEN, Zhengxia YANG, Quan LIU, et al. Heuristic Sarsa algorithm based on value function transfer[J]. Journal on communications, 2018, 39(8): 37-47. DOI: 10.11959/j.issn.1000-436x.2018133.
针对 Sarsa 算法存在的收敛速度较慢的问题,提出一种改进的基于值函数迁移的启发式 Sarsa 算法(VFT-HSA)。该算法将Sarsa算法与值函数迁移方法相结合,引入自模拟度量方法,在相同的状态空间和动作空间下,对新任务与历史任务之间的不同状态进行相似性度量,对满足条件的历史状态进行值函数迁移,提高算法的收敛速度。此外,该算法结合启发式探索方法,引入贝叶斯推理,结合变分推理衡量信息增益,并运用获取的信息增益构建内在奖赏函数作为探索因子,进而加快算法的收敛速度。将所提算法用于经典的Grid World问题,并与Sarsa算法、Q-Learning算法以及收敛性能较好的VFT-Sarsa算法、IGP-Sarsa算法进行比较,实验表明,所提算法具有较快的收敛速度和较好的稳定性。
With the problem of slow convergence for traditional Sarsa algorithm
an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method
and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition
combined with heuristic exploration method
the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally
using the obtained information gain to build intrinsic reward function model as exploring factors
to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem
and compared with the traditional Sarsa algorithm
the Q-Learning algorithm
and the VFT-Sarsa algorithm
the IGP-Sarsa algorithm with better convergence performance
the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.
SUTTON R S , BARTO G A . Reinforcement learning:an introduction [M ] . Cambridge : MIT PressPress , 1998 .
SCHMIDHUBER J , INFORMATIK T T . On learning how to learn learning strategies [R ] . Germany:Technische University , 1995 .
AMMAR H B , EATON E , LUNA J M , et al . Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning [C ] // The 15th International Conference on Artificial Intelligence . 2015 : 3345 - 3351 .
GUPTA A , DEVIN C , LIU Y X , et al . Learning invariant feature spaces to transfer skills with reinforcement learning [C ] // The 5th International Conference on Learning Representations . 2017 : 2147 - 2153 .
LAROCHE R , BARLIER M . Transfer reinforcement learning with shared dynamics [C ] // The 31th International Conference on the Association for the Advance of Artificial Intelligence . 2017 : 2147 - 2153 .
BARRETO A , DABNEY W , MUNOS R , et al . Successor features for transfer in reinforcement learning [C ] // The 32th International Conference on Neural Information Processing Systems . 2017 : 4055 - 4065 .
DEARDEN R , NIR F , STUART R . Bayesian Q-learning [C ] // The 21th International Conference on the Association for the Advance of Artificial Intelligence . 1998 : 761 - 768 .
GUEZ A , SILVER D , DAYAN P . Scalable and efficient Bayes- adaptive reinforcement learning based on Monte-Carlo tree search [J ] . Journal of Artificial Intelligence Research , 2013 , 48 ( 1 ): 841 - 883 .
LITTLE D Y , SOMMER F T . Learning and exploration in action-perception loops [J ] . Frontiers in Neural Circuits , 2013 , 7 ( 7 ): 37 - 56 .
MANSOUR Y , SLIVKINS A , SYRGKANIS V . Bayesian incentive-compatible bandit exploration [C ] // The 16th International Conference on Economics and Computation . 2015 : 565 - 582 .
VIEN N A , LEE S G , CHUNG T C . Bayes-adaptive hierarchical MDPs [J ] . Applied Intelligence , 2016 , 45 ( 1 ): 112 - 126 .
WU B , FENG Y . Monte-Carlo Bayesian reinforcement learning using a compact factored representation [C ] // The 4th International Conference on Information Science and Control Engineering . 2017 : 466 - 469 .
傅启明 , 刘全 , 伏玉琛 , 等 . 一种高斯过程的带参近似策略迭代算法 [J ] . 软件学报 , 2013 , 24 ( 11 ): 2676 - 2687 .
FU Q M , LIU Q , FU Y C , et al . Parametric approximation policy strategy iteration algorithm based on Gaussian process [J ] . Journal of Software , 2013 , 24 ( 11 ): 2676 - 2687 .
GIVAN R , DEAN T , GREIG M . Equivalence notions and model minimization in Markov decision processes [J ] . Artificial Intelligence , 2003 , 147 ( 1 ): 163 - 223 .
FERNS N , PANANGADEN P , PRECUP D . Metrics for finite Markov decision processes [C ] // The 20th International Conference on Uncertainty in Artificial Intelligence . 2004 : 162 - 169 .
BEAL M J . Variational algorithms for approximate Bayesian inference [D ] . London:University of London , 2003 .
傅启明 , 刘全 , 尤树华 , 等 . 一种新的基于值函数迁移的快速Sarsa算法 [J ] . 电子学报 , 2014 , 42 ( 11 ): 2157 - 2161 .
FU Q M , LIU Q , YOU S H , et al . A novel fast sarsa algorithm based on value function transfer [J ] . Acta Electronica Sinica , 2014 , 42 ( 11 ): 2157 - 2161 .
MIERING M , HASSELT H V . The QV family compared to other reinforcement learning algorithms [C ] // The 17th International Conference on Approximate Dynamic Programming and Reinforcement Learning . 2008 : 101 - 108 .
CHUNG J J , LAWRANCE N R J , SUKKARIEH S . Gaussian processes for informative exploration in reinforcement learning [C ] // The 20th International Conference on Robotics and Automation . 2013 : 2633 - 2639 .
0
浏览量
1153
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构