Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation

ZHOU Quan; NIU Yingtao

doi:10.11959/j.issn.1000-436x.2024131

您当前的位置：

首页 >

文章列表页 >

Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation

Papers | 更新时间：2024-08-09

- Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation
- Journal on Communications Vol. 45, Issue 7, Pages: 117-126(2024)
- 作者机构：
  
  1.国防科技大学第六十三研究所，江苏南京 210007
  2.陆军工程大学通信工程学院，江苏南京 210007
- 作者简介：
- 基金信息：
  
  The National Natural Science Foundation of China(62371461)
- DOI：10.11959/j.issn.1000-436x.2024131
  CLC： TN973.3
- Received：02 February 2024，
  
  Revised：2024-06-28，
  
  Published：25 July 2024
- 稿件说明：
移动端阅览
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126.

ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126.
周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(07):117-126. DOI： 10.11959/j.issn.1000-436x.2024131.

ZHOU Quan,NIU Yingtao.Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation[J].Journal on Communications,2024,45(07):117-126. DOI： 10.11959/j.issn.1000-436x.2024131.

摘要

为提高基于深度强化学习的通信抗干扰算法的学习效率，以更快适应未知干扰环境，提出一种基于相似性样本生成的深度强化学习快速抗干扰算法。该算法将基于互模拟关系的状态-动作对相似性度量与基于深度Q网络的抗干扰算法相结合，能在未知动态干扰环境下快速学习有效的多域抗干扰策略。算法在完成每步传输动作时，首先利用深度Q网络抗干扰算法与环境交互，获得实际的状态-动作对。然后，基于互模拟关系生成与之相似的状态-动作集，从而利用相似状态-动作集生成模拟的训练样本。通过上述操作，算法每步迭代能获得大量训练样本，可显著加快抗干扰算法的训练进程和收敛速度。仿真结果表明，在多路扫频干扰和智能阻塞干扰下，所提算法收敛速度快，且收敛后的归一化吞吐量均显著优于常规深度Q网络算法、Q学习算法以及基于知识复用的改进Q学习算法。

Abstract

To improve the learning efficiency of anti-jamming algorithms based on deep reinforcement learning and enable them to adapt more quickly to unknown jamming environments

a fast deep reinforcement learning anti-jamming algorithm based on similar sample generation was proposed. By combining the similarity measurement of state-action pairs

derived from bisimulation

with an anti-jamming algorithm grounded in the deep Q-network

this algorithm was able to quickly learn effective multi-domain anti-jamming strategies in unknown

dynamic jamming environments. Specifically

once a transmission action was completed

the proposed algorithm first interacted with the environment using the deep Q-network to acquire actual state-action pairs. Then it generated a set of similar state-action pairs based on bisimulation

employing these similar state-action pairs to produce simulated training samples. Through these operations

the algorithm was able to acquire a large number of training samples at each iteration step

thereby significantly accelerating the training process and convergence speed. Simulation results show that under comb sweep jamming and intelligent blocking jamming

the proposed algorithm exhibits rapid convergence speed

and its normalized throughput after convergence significantly superior to the conventional deep Q-network algorithm

the Q-learning algorithm

and the improved Q-learning algorithm based on knowledge reuse.

关键词

Keywords

references

DON T . Principles of spread-spectrum communication systems [M ] . Berlin : Springer , 2018 .

姚富强 . 通信抗干扰工程与实践 [M ] . 北京 : 电子工业出版社 , 2012 .

YAO F Q . Communication anti-jamming engineering and practice [M ] . Beijing : Publishing House of Electronics Industry , 2012 .

XIAO L , JIANG D H , XU D J , et al . Two-dimensional antijamming mobile communication based on reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2018 , 67 ( 10 ): 9499 - 9512 .

XIAO L , JIANG D H , WAN X Y , et al . Anti-jamming underwater transmission with mobility and learning [J ] . IEEE Communications Letters , 2018 , 22 ( 3 ): 542 - 545 .

YUAN H C , SONG F , CHU X J , et al . Joint relay and channel selection against mobile and smart jammer: a deep reinforcement learning approach [J ] . IET Communications , 2021 , 15 ( 17 ): 2237 - 2251 .

XIAO L , DING Y Z , HUANG J H , et al . UAV anti-jamming video transmissions with QoE guarantee: a reinforcement learning-based approach [J ] . IEEE Transactions on Communications , 2021 , 69 ( 9 ): 5933 - 5947 .

LU X Z , XIAO L , NIU G H , et al . Safe exploration in wireless security: a safe reinforcement learning algorithm with hierarchical structure [J ] . IEEE Transactions on Information Forensics and Security , 2022 , 17 : 732 - 743 .

YANG H L , XIONG Z H , ZHAO J , et al . Intelligent reflecting surface assisted anti-jamming communications: a fast reinforcement learning approach [J ] . IEEE Transactions on Wireless Communications , 2021 , 20 ( 3 ): 1963 - 1974 .

LI Y Y , XU Y H , LI G X , et al . Dynamic spectrum anti-jamming access with fast convergence: a labeled deep reinforcement learning approach [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 5447 - 5458 .

ZHOU Q , NIU Y T , XIANG P , et al . Intra-domain knowledge reuse assisted reinforcement learning for fast anti-jamming communication [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 4707 - 4720 .

YAO F Q , JIA L L , SUN Y M , et al . A hierarchical learning approach to anti-jamming channel selection strategies [J ] . Wireless Networks , 2019 , 25 ( 1 ): 201 - 213 .

孙岳 , 李蓓蕾 , 梁彩虹 , 等 . 块衰落信道下串联多链空间耦合LDPC码设计 [J ] . 西安电子科技大学学报 , 2019 , 46 ( 2 ): 1 - 5, 28 .

SUN Y , LI B L , LIANG C H , et al . Design of serial connecting multiple spatially coupled LDPC codes for block-fading channels [J ] . Journal of Xidian University , 2019 , 46 ( 2 ): 1 - 5, 28 .

BOUZABIA H , DO T N , KADDOUM G . Deep learning-enabled deceptive jammer detection for low probability of intercept communications [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2166 - 2177 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .

ZENG L H , YAO F Q , ZHANG J Z , et al . Dynamic spectrum access based on prior knowledge enabled reinforcement learning with double actions in complex electromagnetic environment [J ] . China Communications , 2022 , 19 ( 7 ): 13 - 24 .

YAO F Q , JIA L L . A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks [J ] . IEEE Wireless Communications Letters , 2019 , 8 ( 4 ): 1024 - 1027 .

HUANG Y , ZHU X Y , WU Q H . Intelligent spectrum anti-jamming with cognitive software-defined architecture [J ] . IEEE Systems Journal , 2023 , 17 ( 2 ): 2686 - 2697 .

LI X C , CHEN J N , LING X , et al . Deep reinforcement learning-based anti-jamming algorithm using dual action network [J ] . IEEE Transactions on Wireless Communications , 2023 , 22 ( 7 ): 4625 - 4637 .

张国敏 , 张少勇 , 张津威 . 基于PPO算法的攻击路径发现与寻优方法 [J ] . 信息网络安全 , 2023 , 23 ( 9 ): 47 - 57 .

ZHANG G M , ZHANG S Y , ZHANG J W . Discovery and optimization method of attack paths based on PPO algorithm [J ] . Netinfo Security , 2023 , 23 ( 9 ): 47 - 57 .

LIU X , XU Y H , JIA L L , et al . Anti-jamming communications using spectrum waterfall: a deep reinforcement learning approach [J ] . IEEE Communications Letters , 2018 , 22 ( 5 ): 998 - 1001 .

ZHOU Q , LI Y G , NIU Y T . Intelligent anti-jamming communication for wireless sensor networks: a multi-agent reinforcement learning approach [J ] . IEEE Open Journal of the Communications Society , 2021 , 2 : 775 - 784 .

Views

992

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Strong secrecy communication of arbitrarily varying wiretap channels with causal state information at the encoder

GAT-based decision mechanism for decentralized joint routing and spectrum access

Multi-cluster computing power resource scheduling algorithm based on DDPG reinforcement learning

Task offloading and resource allocation strategy for vehicular edge computing assisted by intelligent reflecting surfaces

D3QN-based collaborative offloading algorithm for vehicular networks assisted by digital twins

Related Author

Chen Yiqi

Dong Chen

Luo Yuan

Zhou Zibo

Ren Baoquan

Zhong Xudong

Liu Qi

Qin Zhen

Related Institution

Chair of Theoretical Information Technology, Technical University of Munich

School of Computer Science, Shanghai Jiao Tong University

Systems Engineering Institute, Academy of Military Science

Air Force Early Warning Academy

School of Artificial Intelligence, China University of Mining and Technology-Beijing

AI问答

⁰