Spectrum resource allocation for high-throughput satellite communications based on behavior cloning

QIN Hao; LI Shuangyi; ZHAO Di; MENG Haowei; SONG Bin

doi:10.11959/j.issn.1000-436x.2024100

您当前的位置：

首页 >

文章列表页 >

Spectrum resource allocation for high-throughput satellite communications based on behavior cloning

Papers | 更新时间：2024-06-24

- Spectrum resource allocation for high-throughput satellite communications based on behavior cloning
- Journal on Communications Vol. 45, Issue 5, Pages: 101-114(2024)
- 作者机构：
  
  1.西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西西安 710071
  2.西安电子科技大学杭州研究院,浙江杭州 311200
- 作者简介：
- 基金信息：
  
  The National Natural Science Foundation of China(62071354;62201419);The Key Research and Development Program of Shaanxi Province(2022ZDLGY 05-08)
- DOI：10.11959/j.issn.1000-436x.2024100
  CLC： TN92
- Received：05 February 2024，
  
  Revised：2024-05-07，
  
  Published：30 May 2024
- 稿件说明：
移动端阅览
秦浩,李双益,赵迪等.基于行为克隆的高通量卫星通信频谱资源分配[J].通信学报,2024,45(05):101-114.

QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114.
秦浩,李双益,赵迪等.基于行为克隆的高通量卫星通信频谱资源分配[J].通信学报,2024,45(05):101-114. DOI： 10.11959/j.issn.1000-436x.2024100.

QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114. DOI： 10.11959/j.issn.1000-436x.2024100.

摘要

为应对在高通量多波束卫星系统中，随着波束数量和用户规模的扩大，频谱资源分配问题的维度急剧增加和求解复杂度呈指数级上升这一挑战，提出了一种结合行为克隆与深度强化学习的两阶段算法。第一阶段基于行为克隆，利用已有卫星资源分配决策数据对策略网络进行预训练，通过模仿专家行为减少盲目探索，加快算法收敛。第二阶段基于近端策略优化，进一步优化策略网络，并通过引入卷积注意力模块有效地提取用户业务状态特征，以提升算法整体性能。仿真结果表明，所提算法在收敛速度和算法稳定性方面均优于其他基准算法，并在系统时延、系统平均满意度和频谱效率等性能指标上表现更佳。

Abstract

In high-throughput multi-beam satellite systems

the dimensionality of the spectrum resource allocation problem increased drastically with the number of satellite beams and service users

which caused an exponential rise in the complexity of the solution. To address the challenge

a two-stage algorithm that combined behavior cloning (BC) with deep reinforcement learning (DRL) was proposed. In the first stage

the strategy network was pretrained using existing decision data from satellite operation through behavior cloning

which mimicked expert behavior to reduce blind exploration and accelerate algorithm convergence. In the second stage

the strategy network was further optimized using the proximal policy optimization (PPO)

and a convolutional block attention module (CBAM) was employed to better extract the user traffic features

thereby enhancing overall algorithm performance. Simulation results demonstrate that the proposed algorithm outperforms the benchmark algorithms in terms of convergence speed and algorithm stability

and also delivers superior performance in system delay

average system satisfaction

and spectrum efficiency.

关键词

Keywords

references

WANG C X , YOU X H , GAO X Q , et al . On the road to 6G: visions, requirements, key technologies, and testbeds [J ] . IEEE Communications Surveys & Tutorials , 2023 , 25 ( 2 ): 905 - 974 .

JIA M , ZHANG X M , SUN J T , et al . Intelligent resource management for satellite and terrestrial spectrum shared networking toward B5G [J ] . IEEE Wireless Communications , 2020 , 27 ( 1 ): 54 - 61 .

PARIS A , DEL PORTILLO I , CAMERON B , et al . A genetic algorithm for joint power and bandwidth allocation in multibeam satellite systems [C ] // Proceedings of the 2019 IEEE Aerospace Conference . Piscataway : IEEE Press , 2019 : 1 - 15 .

COCCO G , DE COLA T , ANGELONE M , et al . Radio resource management optimization of flexible satellite payloads for DVB-S2 systems [J ] . IEEE Transactions on Broadcasting , 2018 , 64 ( 2 ): 266 - 280 .

何元智 , 彭聪 , 于季弘 , 等 . 面向密集多波束组网的卫星通信系统资源调度算法 [J ] . 通信学报 , 2021 , 42 ( 4 ): 109 - 118 .

HE Y Z , PENG C , YU J H , et al . Resource scheduling algorithm of satellite communication system for future multi-beam dense networking [J ] . Journal on Communications , 2021 , 42 ( 4 ): 109 - 118 .

ZHANG P , WANG X H , MA Z G , et al . Joint optimization of satisfaction index and spectrum efficiency with cache restricted for resource allocation in multi-beam satellite systems [J ] . China Communications , 2019 , 16 ( 2 ): 189 - 201 .

ORTIZ-GOMEZ F G , LEI L , LAGUNAS E , et al . Machine learning for radio resource management in multibeam GEO satellite systems [J ] . Electronics , 2022 , 11 ( 7 ): 992 .

张沛 , 刘帅军 , 马治国 , 等 . 基于深度增强学习和多目标优化改进的卫星资源分配算法 [J ] . 通信学报 , 2020 , 41 ( 6 ): 51 - 60 .

ZHANG P , LIU S J , MA Z G , et al . Improved satellite resource allocation algorithm based on DRL and MOP [J ] . Journal on Communications , 2020 , 41 ( 6 ): 51 - 60 .

LI Z W , XIE Z C , LIANG X W . Dynamic channel reservation strategy based on DQN algorithm for multi-service LEO satellite communication system [J ] . IEEE Wireless Communications Letters , 2021 , 10 ( 4 ): 770 - 774 .

LIU S J , HU X , WANG W D . Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems [J ] . IEEE Access , 2018 , 6 : 15733 - 15742 .

MA S J , HU X , LIAO X L , et al . Deep reinforcement learning for dynamic bandwidth allocation in multi-beam satellite systems [C ] // Proceedings of the 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS) . Piscataway : IEEE Press , 2021 : 955 - 959 .

MENG H W , XIN N , QIN H , et al . A recursive DRL-based resource allocation method for multibeam satellite communication systems [J ] . Chinese Journal of Electronics , 2023 , 33 : 1 - 10 .

GUO W X , TIAN W H , YE Y F , et al . Cloud resource scheduling with deep reinforcement learning and imitation learning [J ] . IEEE Internet of Things Journal , 2021 , 8 ( 5 ): 3576 - 3586 .

TORABI F , WARNELL G , STONE P . Behavioral cloning from observation [J ] . arXiv Preprint , arXiv: 1805.01954 , 2018 .

NAIR A , MCGREW B , ANDRYCHOWICZ M , et al . Overcoming exploration in reinforcement learning with demonstrations [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6292 - 6299 .

COTTATELLUCCI L , DEBBAH M , GALLINARO G , et al . Interference mitigation techniques for broadband satellite systems [C ] // Proceedings of the 24th AIAA International Communications Satellite Systems Conference . Reston : AIAA , 2006 : 1 - 13 .

MARAL G , BOUSQUET M , SUN Z L . Satellite communications systems: systems, techniques and technology [M ] . New York : John Wiley & Sons , 2020 .

SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [J ] . arXiv Preprint , arXiv: 1707.06347 , 2017 .

刘全 , 翟建伟 , 章宗长 , 等 . 深度强化学习综述 [J ] . 计算机学报 , 2018 , 41 ( 1 ): 1 - 27 .

LIU Q , ZHAI J W , ZHANG Z C , et al . A survey on deep reinforcement learning [J ] . Chinese Journal of Computers , 2018 , 41 ( 1 ): 1 - 27 .

NGUYEN T D , HAN Y . A proportional fairness algorithm with QoS provision in downlink OFDMA systems [J ] . IEEE Communications Letters , 2006 , 10 ( 11 ): 760 - 762 .

WOO S , PARK J , LEE J Y , et al . CBAM: convolutional block attention module [C ] // European Conference on Computer Vision . Berlin : Springer , 2018 : 3 - 19 .

WANG L G , FERNANDEZ C , STILLER C . High-level decision making for automated highway driving via behavior cloning [J ] . IEEE Transactions on Intelligent Vehicles , 2023 , 8 ( 1 ): 923 - 935 .

BOJARSKI M , TESTA D D , DWORAKOWSKI D , et al . End to end learning for self-driving cars [J ] . arXiv Preprint , arXiv: 1604.07316 , 2016 .

PRICE B , BOUTILIER C . Accelerating reinforcement learning through implicit imitation [J ] . Journal of Artificial Intelligence Research , 2003 , 19 : 569 - 629 .

LONG P X , FAN T X , LIAO X Y , et al . Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6252 - 6259 .

YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative multi-agent games [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : ACM Press , 2022 : 24611 - 24624 .

ZHANG H J , YANG N , HUANGFU W , et al . Power control based on deep reinforcement learning for spectrum sharing [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 6 ): 4209 - 4219 .

丁伟 , 陶啸 , 叶文熙 , 等 . 高轨道高通量卫星多波束天线技术研究进展 [J ] . 空间电子技术 , 2019 , 16 ( 1 ): 62 - 69 .

DING W , TAO X , YE W X , et al . Advances in research on multi-beam antenna techniques for GEO high throughput satellites [J ] . Space Electronic Technology , 2019 , 16 ( 1 ): 62 - 69 .

Views

1620

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Multi-agent cooperative confrontation with proximal policy optimization in urban environments

Intelligent anti-jamming decision algorithm based on proximal policy optimization

Intelligent route planning method with jointing topology control of UAV swarm

GAT-based decision mechanism for decentralized joint routing and spectrum access

Multi-cluster computing power resource scheduling algorithm based on DDPG reinforcement learning

Related Author

MI Guangming

ZHANG Hui

ZHANG Jing

ZHUO Li

MA Song

LI Li

LI Wei

HUANG Wei

Related Institution

School of Information Science and Technology, Beijing University of Technology

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology

Southwest China Institute of Electronic Technology

National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China

Southwest China Research Institute of Electronic Equipment

AI问答

⁰