浏览全部资源
扫码关注微信
1.国防科技大学第六十三研究所,江苏 南京 210007
2.陆军工程大学通信工程学院,江苏 南京 210007
[ "周权(1991- ),男,江苏溧阳人,陆军工程大学与国防科技大学第六十三研究所联合培养博士生,主要研究方向为通信抗干扰技术。" ]
[ "牛英滔(1978- ),男,山东泰安人,博士,国防科技大学第六十三研究所副研究员、硕士生导师,主要研究方向为认知无线电、通信抗干扰技术。" ]
收稿日期:2024-02-02,
修回日期:2024-06-28,
纸质出版日期:2024-07-25
移动端阅览
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126.
ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126.
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126. DOI: 10.11959/j.issn.1000-436x.2024131.
ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126. DOI: 10.11959/j.issn.1000-436x.2024131.
为提高基于深度强化学习的通信抗干扰算法的学习效率,以更快适应未知干扰环境,提出一种基于相似性样本生成的深度强化学习快速抗干扰算法。该算法将基于互模拟关系的状态-动作对相似性度量与基于深度Q网络的抗干扰算法相结合,能在未知动态干扰环境下快速学习有效的多域抗干扰策略。算法在完成每步传输动作时,首先利用深度Q网络抗干扰算法与环境交互,获得实际的状态-动作对。然后,基于互模拟关系生成与之相似的状态-动作集,从而利用相似状态-动作集生成模拟的训练样本。通过上述操作,算法每步迭代能获得大量训练样本,可显著加快抗干扰算法的训练进程和收敛速度。仿真结果表明,在多路扫频干扰和智能阻塞干扰下,所提算法收敛速度快,且收敛后的归一化吞吐量均显著优于常规深度Q网络算法、Q学习算法以及基于知识复用的改进Q学习算法。
To improve the learning efficiency of anti-jamming algorithms based on deep reinforcement learning and enable them to adapt more quickly to unknown jamming environments
a fast deep reinforcement learning anti-jamming algorithm based on similar sample generation was proposed. By combining the similarity measurement of state-action pairs
derived from bisimulation
with an anti-jamming algorithm grounded in the deep Q-network
this algorithm was able to quickly learn effective multi-domain anti-jamming strategies in unknown
dynamic jamming environments. Specifically
once a transmission action was completed
the proposed algorithm first interacted with the environment using the deep Q-network to acquire actual state-action pairs. Then it generated a set of similar state-action pairs based on bisimulation
employing these similar state-action pairs to produce simulated training samples. Through these operations
the algorithm was able to acquire a large number of training samples at each iteration step
thereby significantly accelerating the training process and convergence speed. Simulation results show that under comb sweep jamming and intelligent blocking jamming
the proposed algorithm exhibits rapid convergence speed
and its normalized throughput after convergence significantly superior to the conventional deep Q-network algorithm
the Q-learning algorithm
and the improved Q-learning algorithm based on knowledge reuse.
DON T . Principles of spread-spectrum communication systems [M ] . Berlin : Springer , 2018 .
姚富强 . 通信抗干扰工程与实践 [M ] . 北京 : 电子工业出版社 , 2012 .
YAO F Q . Communication anti-jamming engineering and practice [M ] . Beijing : Publishing House of Electronics Industry , 2012 .
XIAO L , JIANG D H , XU D J , et al . Two-dimensional antijamming mobile communication based on reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2018 , 67 ( 10 ): 9499 - 9512 .
XIAO L , JIANG D H , WAN X Y , et al . Anti-jamming underwater transmission with mobility and learning [J ] . IEEE Communications Letters , 2018 , 22 ( 3 ): 542 - 545 .
YUAN H C , SONG F , CHU X J , et al . Joint relay and channel selection against mobile and smart jammer: a deep reinforcement learning approach [J ] . IET Communications , 2021 , 15 ( 17 ): 2237 - 2251 .
XIAO L , DING Y Z , HUANG J H , et al . UAV anti-jamming video transmissions with QoE guarantee: a reinforcement learning-based approach [J ] . IEEE Transactions on Communications , 2021 , 69 ( 9 ): 5933 - 5947 .
LU X Z , XIAO L , NIU G H , et al . Safe exploration in wireless security: a safe reinforcement learning algorithm with hierarchical structure [J ] . IEEE Transactions on Information Forensics and Security , 2022 , 17 : 732 - 743 .
YANG H L , XIONG Z H , ZHAO J , et al . Intelligent reflecting surface assisted anti-jamming communications: a fast reinforcement learning approach [J ] . IEEE Transactions on Wireless Communications , 2021 , 20 ( 3 ): 1963 - 1974 .
LI Y Y , XU Y H , LI G X , et al . Dynamic spectrum anti-jamming access with fast convergence: a labeled deep reinforcement learning approach [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 5447 - 5458 .
ZHOU Q , NIU Y T , XIANG P , et al . Intra-domain knowledge reuse assisted reinforcement learning for fast anti-jamming communication [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 4707 - 4720 .
YAO F Q , JIA L L , SUN Y M , et al . A hierarchical learning approach to anti-jamming channel selection strategies [J ] . Wireless Networks , 2019 , 25 ( 1 ): 201 - 213 .
孙岳 , 李蓓蕾 , 梁彩虹 , 等 . 块衰落信道下串联多链空间耦合LDPC码设计 [J ] . 西安电子科技大学学报 , 2019 , 46 ( 2 ): 1 - 5, 28 .
SUN Y , LI B L , LIANG C H , et al . Design of serial connecting multiple spatially coupled LDPC codes for block-fading channels [J ] . Journal of Xidian University , 2019 , 46 ( 2 ): 1 - 5, 28 .
BOUZABIA H , DO T N , KADDOUM G . Deep learning-enabled deceptive jammer detection for low probability of intercept communications [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2166 - 2177 .
MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .
ZENG L H , YAO F Q , ZHANG J Z , et al . Dynamic spectrum access based on prior knowledge enabled reinforcement learning with double actions in complex electromagnetic environment [J ] . China Communications , 2022 , 19 ( 7 ): 13 - 24 .
YAO F Q , JIA L L . A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks [J ] . IEEE Wireless Communications Letters , 2019 , 8 ( 4 ): 1024 - 1027 .
HUANG Y , ZHU X Y , WU Q H . Intelligent spectrum anti-jamming with cognitive software-defined architecture [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2686 - 2697 .
LI X C , CHEN J N , LING X , et al . Deep reinforcement learning-based anti-jamming algorithm using dual action network [J ] . IEEE Transactions on Wireless Communications , 2023 , 22 ( 7 ): 4625 - 4637 .
张国敏 , 张少勇 , 张津威 . 基于PPO算法的攻击路径发现与寻优方法 [J ] . 信息网络安全 , 2023 , 23 ( 9 ): 47 - 57 .
ZHANG G M , ZHANG S Y , ZHANG J W . Discovery and optimization method of attack paths based on PPO algorithm [J ] . Netinfo Security , 2023 , 23 ( 9 ): 47 - 57 .
LIU X , XU Y H , JIA L L , et al . Anti-jamming communications using spectrum waterfall: a deep reinforcement learning approach [J ] . IEEE Communications Letters , 2018 , 22 ( 5 ): 998 - 1001 .
ZHOU Q , LI Y G , NIU Y T . Intelligent anti-jamming communication for wireless sensor networks: a multi-agent reinforcement learning approach [J ] . IEEE Open Journal of the Communications Society , 2021 , 2 : 775 - 784 .
0
浏览量
56
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构