浏览全部资源
扫码关注微信
1.电子信息系统复杂电磁环境效应国家重点实验室,河南 洛阳 471032
2.浙江理工大学信息科学与工程学院,浙江 杭州 310018
3.浙江理工大学计算机科学与技术学院,浙江 杭州 310018
[ "张静克(1988- ),男,河南洛阳人,博士,电子信息系统复杂电磁环境效应国家重点实验室助理研究员,主要研究方向为雷达对抗。" ]
[ "杨凯(1999- ),男,浙江绍兴人,浙江理工大学硕士生,主要研究方向为电子对抗。" ]
[ "李超(1986- ),男,河南洛阳人,博士,电子信息系统复杂电磁环境效应国家重点实验室助理研究员,主要研究方向为雷达对抗。" ]
[ "王洪雁(1979- ),男,河南南阳人,博士,浙江理工大学特聘教授,主要研究方向为雷达对抗、MIMO雷达信号处理、机器视觉等。" ]
收稿日期:2024-07-19,
修回日期:2024-12-02,
纸质出版日期:2024-12-25
移动端阅览
张静克,杨凯,李超等.基于先验知识嵌入,LSTM-PPO模型的智能干扰决策算法[J].通信学报,2024,45(12):227-239.
ZHANG Jingke,YANG Kai,LI Chao,et al.Intelligent interference decision algorithm with prior knowledge embedded LSTM-PPO model[J].Journal on Communications,2024,45(12):227-239.
张静克,杨凯,李超等.基于先验知识嵌入,LSTM-PPO模型的智能干扰决策算法[J].通信学报,2024,45(12):227-239. DOI: 10.11959/j.issn.1000-436x.2024270.
ZHANG Jingke,YANG Kai,LI Chao,et al.Intelligent interference decision algorithm with prior knowledge embedded LSTM-PPO model[J].Journal on Communications,2024,45(12):227-239. DOI: 10.11959/j.issn.1000-436x.2024270.
针对基于传统强化学习模型的多功能雷达(MFR)干扰决策算法决策效率及有效性低、策略不稳定的问题,提出基于先验知识嵌入长短期记忆(LSTM)网络-近端策略优化(PPO)模型的智能干扰决策算法。所提算法首先将MFR干扰决策问题定义为马尔可夫决策过程(MDP)。其次,基于收益塑造理论将干扰领域先验知识嵌入PPO模型的奖励函数,利用重塑所得奖励函数引导智能体快速收敛从而提升决策效率。而后,基于LSTM优异的时序特征抽取能力,捕捉回波数据的动态特征以有效刻画雷达工作状态。最后,将所抽取动态特征输入PPO模型,经由所嵌入先验知识的引导,从而可快速获得有效干扰决策。仿真实验表明,相较于传统深度干扰决策算法,所提算法具有较高的决策效率以及有效性,且可高效稳健地达成MFR干扰决策算法。
Focusing on the issues of low efficiency and effectiveness in decision-making as well as the instability of traditional reinforcement learning model-based multi-function radar (MFR) jamming decision algorithms
a prior knowledge embedded long short-term memory (LSTM) network-proximal policy optimization (PPO) model based intelligent interference decision algorithm was developed. Firstly
the MFR interference decision problem was regarded as a Markov decision process (MDP). Furthermore
by incorporating prior knowledge associated with the interference domain into the reward function of the PPO model using revenue shaping theory
a reshaped reward function was obtained to guide agent converge quickly so as to improve decision-making efficiency. Besides
leveraging LSTM’s excellent temporal feature extraction ability enables capturing dynamic characteristics of echo data effectively to describe radar working states. Finally
these extracted dynamic features were inputted into the PPO model. With guidance from embedded prior knowledge
an effective interference decision can be achieved rapidly. Simulation results demonstrate that compared to traditional reinforcement learning model based interference decision algorithms
higher efficiency and effectiveness in decision-making can be attained via the proposed algorithms and the MFR interference decision can be efficiently and robustly achieved.
FENG L W , LIU S T , XU H Z . Multifunctional radar cognitive jamming decision based on dueling double deep Q-network [J ] . IEEE Access , 2022 , 10 : 112150 - 112157 .
ZHANG C D , WANG L , JIANG R D , et al . Radar jamming decision-making in cognitive electronic warfare: a review [J ] . IEEE Sensors Journal , 2023 , 23 ( 11 ): 11383 - 11403 .
GENG J , JIU B , LI K , et al . Radar and jammer intelligent game under jamming power dynamic allocation [J ] . Remote Sensing , 2023 , 15 ( 3 ): 581 .
SUN H , TONG N , SUN F . Electronic jamming mode selection based on D-S evidence theory [J ] . Journal of Projectiles, Arrows and Guidance , 2003 , 23 ( 2 ): 218 - 220 .
LADOSZ P , WENG L L , KIM M , et al . Exploration in deep reinforcement learning: a survey [J ] . Information Fusion , 2022 , 85 : 1 - 22 .
ZHENG S J , ZHANG C D , HU J , et al . Radar-jamming decision-making based on improved Q-learning and FPGA hardware implementation [J ] . Remote Sensing , 2024 , 16 ( 7 ): 1190 .
XIA L Q , WANG L L , XIE Z D , et al . GA-dueling DQN jamming decision-making method for intra-pulse frequency agile radar [J ] . Sensors , 2024 , 24 ( 4 ): 1325 .
邹玮琦 , 牛朝阳 , 刘伟 , 等 . 基于A3C的多功能雷达认知干扰决策方法 [J ] . 系统工程与电子技术 , 2023 , 45 ( 1 ): 86 - 92 .
ZOU W Q , NIU C/Z)Y , LIU W , et al . Cognitive jamming decision-making method against multifunctional radar based on A3C [J ] . Systems Engineering and Electronics , 2023 , 45 ( 1 ): 86 - 92 .
RAO N , XU H , WANG D , et al . Efficient jamming resource allocation against frequency-hopping spread spectrum in WSNs with asynchronous deep reinforcement learning [J ] . IEEE Sensors Journal , 2024 , 24 ( 8 ): 13560 - 13577 .
SHI Q , YING W D , LYU L , et al . Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints [J ] . Industrial Robot: the International Journal of Robotics Research and Application , 2020 , 47 ( 3 ): 335 - 347 .
ZHONG J , WANG T , CHENG L L . Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics [J ] . Complex & Intelligent Systems , 2022 , 8 ( 3 ): 1899 - 1912 .
SUI Z Z , PU Z Q , YI J Q , et al . Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2021 , 32 ( 6 ): 2358 - 2372 .
LAURI M , HSU D , PAJARINEN J . Partially observable Markov decision processes in robotics: a survey [J ] . IEEE Transactions on Robotics , 2023 , 39 ( 1 ): 21 - 40 .
LU R Z , JIANG Z Y , WU H M , et al . Reward shaping-based actor–critic deep reinforcement learning for residential energy management [J ] . IEEE Transactions on Industrial Informatics , 2023 , 19 ( 3 ): 2662 - 2673 .
VAN HOUDT G , MOSQUERA C , NÁPOLES G . A review on the long short-term memory model [J ] . Artificial Intelligence Review , 2020 , 53 ( 8 ): 5929 - 5955 .
GU Y , CHENG Y H , CHEN C L P , et al . Proximal policy optimization with policy feedback [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2022 , 52 ( 7 ): 4600 - 4610 .
CANESE L , CARDARILLI G C , NUNZIO L D , et al . Multi-agent reinforcement learning: a review of challenges and applications [J ] . Applied Sciences , 2021 , 11 ( 11 ): 4948 .
HICKLING T , ZENATI A , AOUF N , et al . Explainability in deep reinforcement learning: a review into current methods and applications [J ] . ACM Computing Surveys , 2024 , 56 ( 5 ): 1 - 35 .
RASHID T , SAMVELYAN M , WITT C S D , et al . QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 10199 - 10210 .
ZHANG J Y , KOPPEL A , BEDI A S , et al . Variational policy gradient method for reinforcement learning with general utilities [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 4572 - 4583 .
LI H P , HE H B . Multiagent trust region policy optimization [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2024 , 35 ( 9 ): 12873 - 12887 .
ZHANG J W , ZHANG Z H , HAN S , et al . Proximal policy optimization via enhanced exploration efficiency [J ] . Information Sciences , 2022 , 609 : 750 - 765 .
MOON J . Generalized risk-sensitive optimal control and Hamilton–jacobi–bellman equation [J ] . IEEE Transactions on Automatic Control , 2021 , 66 ( 5 ): 2319 - 2325 .
MEYN S . The projected Bellman equation in reinforcement learning [J ] . IEEE Transactions on Automatic Control , 2024 , 69 ( 12 ): 8323 - 8337 .
CLIFTON J , LABER E . Q-learning: theory and applications [J ] . Annual Review of Statistics and Its Application , 2020 , 7 : 279 - 301 .
WENG W , GUPTA H , HE N , et al . The mean-squared error of double q-learning [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 6815 - 6826 .
CAMAGLIA F , NEMENMAN I , MORA T , et al . Bayesian estimation of the Kullback-Leibler divergence for categorical systems using mixtures of dirichlet priors [J ] . Physical Review E , 2024 , 109 ( 2 ): 024305 .
TIAN Y J , ZHANG Y Q , ZHANG H B . Recent advances in stochastic gradient descent in deep learning [J ] . Mathematics , 2023 , 11 ( 3 ): 682 .
ZHAO Y R , WANG X , HUANG Z T . Multi-function radar modeling: a review [J ] . IEEE Sensors Journal , 2024 , 24 ( 20 ): 31658 - 31680 .
FENG H C , JIANG K L , ZHOU Z X , et al . Syntactic modeling and neural-based parsing for multifunction radar signal interpretation [J ] . IEEE Transactions on Aerospace and Electronic Systems , 2024 , 60 ( 4 ): 5060 - 5072 .
ZHU M T , LI Y J , WANG S F . Model-based time series clustering and interpulse modulation parameter estimation of multifunction radar pulse sequences [J ] . IEEE Transactions on Aerospace and Electronic Systems , 2021 , 57 ( 6 ): 3673 - 3690 .
MITCHELL E , RAFAILOV R , PENG X B , et al . Offline meta-reinforcement learning with advantage weighting [J ] . arXiv Preprint , arXiv: 2008.06043 , 2020 .
0
浏览量
6
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构