基于相似性样本生成的深度强化学习快速抗干扰算法

周权; 牛英滔

doi:10.11959/j.issn.1000-436x.2024131

您当前的位置：

首页 >

文章列表页 >

基于相似性样本生成的深度强化学习快速抗干扰算法

学术论文 | 更新时间：2024-08-09

- 基于相似性样本生成的深度强化学习快速抗干扰算法
- Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation
- 通信学报 2024年45卷第7期页码：117-126
- 作者机构：
  
  1.国防科技大学第六十三研究所，江苏南京 210007
  2.陆军工程大学通信工程学院，江苏南京 210007
- 作者简介：
  
  [ "周权（1991- ），男，江苏溧阳人，陆军工程大学与国防科技大学第六十三研究所联合培养博士生，主要研究方向为通信抗干扰技术。" ]
  [ "牛英滔（1978- ），男，山东泰安人，博士，国防科技大学第六十三研究所副研究员、硕士生导师，主要研究方向为认知无线电、通信抗干扰技术。" ]
- 基金信息：
  
  国家自然科学基金资助项目(62371461)
- DOI：10.11959/j.issn.1000-436x.2024131
  中图分类号： TN973.3
- 收稿日期：2024-02-02，
  
  修回日期：2024-06-28，
  
  纸质出版日期：2024-07-25
- 稿件说明：
移动端阅览
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126.

ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126.
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126. DOI： 10.11959/j.issn.1000-436x.2024131.

ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126. DOI： 10.11959/j.issn.1000-436x.2024131.

摘要

为提高基于深度强化学习的通信抗干扰算法的学习效率，以更快适应未知干扰环境，提出一种基于相似性样本生成的深度强化学习快速抗干扰算法。该算法将基于互模拟关系的状态-动作对相似性度量与基于深度Q网络的抗干扰算法相结合，能在未知动态干扰环境下快速学习有效的多域抗干扰策略。算法在完成每步传输动作时，首先利用深度Q网络抗干扰算法与环境交互，获得实际的状态-动作对。然后，基于互模拟关系生成与之相似的状态-动作集，从而利用相似状态-动作集生成模拟的训练样本。通过上述操作，算法每步迭代能获得大量训练样本，可显著加快抗干扰算法的训练进程和收敛速度。仿真结果表明，在多路扫频干扰和智能阻塞干扰下，所提算法收敛速度快，且收敛后的归一化吞吐量均显著优于常规深度Q网络算法、Q学习算法以及基于知识复用的改进Q学习算法。

Abstract

To improve the learning efficiency of anti-jamming algorithms based on deep reinforcement learning and enable them to adapt more quickly to unknown jamming environments

a fast deep reinforcement learning anti-jamming algorithm based on similar sample generation was proposed. By combining the similarity measurement of state-action pairs

derived from bisimulation

with an anti-jamming algorithm grounded in the deep Q-network

this algorithm was able to quickly learn effective multi-domain anti-jamming strategies in unknown

dynamic jamming environments. Specifically

once a transmission action was completed

the proposed algorithm first interacted with the environment using the deep Q-network to acquire actual state-action pairs. Then it generated a set of similar state-action pairs based on bisimulation

employing these similar state-action pairs to produce simulated training samples. Through these operations

the algorithm was able to acquire a large number of training samples at each iteration step

thereby significantly accelerating the training process and convergence speed. Simulation results show that under comb sweep jamming and intelligent blocking jamming

the proposed algorithm exhibits rapid convergence speed

and its normalized throughput after convergence significantly superior to the conventional deep Q-network algorithm

the Q-learning algorithm

and the improved Q-learning algorithm based on knowledge reuse.

关键词

Keywords

references

DON T . Principles of spread-spectrum communication systems [M ] . Berlin : Springer , 2018 .

姚富强 . 通信抗干扰工程与实践 [M ] . 北京 : 电子工业出版社 , 2012 .

YAO F Q . Communication anti-jamming engineering and practice [M ] . Beijing : Publishing House of Electronics Industry , 2012 .

XIAO L , JIANG D H , XU D J , et al . Two-dimensional antijamming mobile communication based on reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2018 , 67 ( 10 ): 9499 - 9512 .

XIAO L , JIANG D H , WAN X Y , et al . Anti-jamming underwater transmission with mobility and learning [J ] . IEEE Communications Letters , 2018 , 22 ( 3 ): 542 - 545 .

YUAN H C , SONG F , CHU X J , et al . Joint relay and channel selection against mobile and smart jammer: a deep reinforcement learning approach [J ] . IET Communications , 2021 , 15 ( 17 ): 2237 - 2251 .

XIAO L , DING Y Z , HUANG J H , et al . UAV anti-jamming video transmissions with QoE guarantee: a reinforcement learning-based approach [J ] . IEEE Transactions on Communications , 2021 , 69 ( 9 ): 5933 - 5947 .

LU X Z , XIAO L , NIU G H , et al . Safe exploration in wireless security: a safe reinforcement learning algorithm with hierarchical structure [J ] . IEEE Transactions on Information Forensics and Security , 2022 , 17 : 732 - 743 .

YANG H L , XIONG Z H , ZHAO J , et al . Intelligent reflecting surface assisted anti-jamming communications: a fast reinforcement learning approach [J ] . IEEE Transactions on Wireless Communications , 2021 , 20 ( 3 ): 1963 - 1974 .

LI Y Y , XU Y H , LI G X , et al . Dynamic spectrum anti-jamming access with fast convergence: a labeled deep reinforcement learning approach [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 5447 - 5458 .

ZHOU Q , NIU Y T , XIANG P , et al . Intra-domain knowledge reuse assisted reinforcement learning for fast anti-jamming communication [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 4707 - 4720 .

YAO F Q , JIA L L , SUN Y M , et al . A hierarchical learning approach to anti-jamming channel selection strategies [J ] . Wireless Networks , 2019 , 25 ( 1 ): 201 - 213 .

孙岳 , 李蓓蕾 , 梁彩虹 , 等 . 块衰落信道下串联多链空间耦合LDPC码设计 [J ] . 西安电子科技大学学报 , 2019 , 46 ( 2 ): 1 - 5, 28 .

SUN Y , LI B L , LIANG C H , et al . Design of serial connecting multiple spatially coupled LDPC codes for block-fading channels [J ] . Journal of Xidian University , 2019 , 46 ( 2 ): 1 - 5, 28 .

BOUZABIA H , DO T N , KADDOUM G . Deep learning-enabled deceptive jammer detection for low probability of intercept communications [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2166 - 2177 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .

ZENG L H , YAO F Q , ZHANG J Z , et al . Dynamic spectrum access based on prior knowledge enabled reinforcement learning with double actions in complex electromagnetic environment [J ] . China Communications , 2022 , 19 ( 7 ): 13 - 24 .

YAO F Q , JIA L L . A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks [J ] . IEEE Wireless Communications Letters , 2019 , 8 ( 4 ): 1024 - 1027 .

HUANG Y , ZHU X Y , WU Q H . Intelligent spectrum anti-jamming with cognitive software-defined architecture [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2686 - 2697 .

LI X C , CHEN J N , LING X , et al . Deep reinforcement learning-based anti-jamming algorithm using dual action network [J ] . IEEE Transactions on Wireless Communications , 2023 , 22 ( 7 ): 4625 - 4637 .

张国敏 , 张少勇 , 张津威 . 基于PPO算法的攻击路径发现与寻优方法 [J ] . 信息网络安全 , 2023 , 23 ( 9 ): 47 - 57 .

ZHANG G M , ZHANG S Y , ZHANG J W . Discovery and optimization method of attack paths based on PPO algorithm [J ] . Netinfo Security , 2023 , 23 ( 9 ): 47 - 57 .

LIU X , XU Y H , JIA L L , et al . Anti-jamming communications using spectrum waterfall: a deep reinforcement learning approach [J ] . IEEE Communications Letters , 2018 , 22 ( 5 ): 998 - 1001 .

ZHOU Q , LI Y G , NIU Y T . Intelligent anti-jamming communication for wireless sensor networks: a multi-agent reinforcement learning approach [J ] . IEEE Open Journal of the Communications Society , 2021 , 2 : 775 - 784 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于多智能体深度强化学习的低轨星座跳波束资源调度研究

基于图卷积神经网络的超密集物联网资源分配策略

基于样本信息熵辅助的深度强化学习抗干扰策略

毫米波车联网多基站多用户下的安全传输方案

基于深度强化学习的空天地一体化网络信息物理系统垂直切换策略