浏览全部资源
扫码关注微信
1.西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西 西安 710071
2.西安电子科技大学杭州研究院,浙江 杭州 311200
[ "秦浩(1976- ),男,陕西绥德人,博士,西安电子科技大学副教授、硕士生导师,主要研究方向为卫星通信星地通信体制、无线通信和卫星通信智能资源管控。" ]
[ "李双益(2001- ),男,河南许昌人,西安电子科技大学硕士生,主要研究方向为卫星通信和无线通信。" ]
[ "赵迪(1995- ),女,山东淄博人,西安电子科技大学博士生,主要研究方向为卫星通信、无线资源管理和机器学习。" ]
[ "孟昊炜(1998- ),男,河南周口人,西安电子科技大学硕士生,主要研究方向为卫星通信和无线通信。" ]
宋彬(1973- ),男,河南郑州人,博士,西安电子科技大学教授、博士生导师,主要研究方向为多媒体通信、多模态数据融合与检索、基于图像内容的识别与机器学习、多模态知识图谱、强化学习、物联网、大数据和推荐系统。bsong@mail.xidian.edu.cn
收稿日期:2024-02-05,
修回日期:2024-05-07,
纸质出版日期:2024-05-30
移动端阅览
秦浩,李双益,赵迪等.基于行为克隆的高通量卫星通信频谱资源分配[J].通信学报,2024,45(05):101-114.
QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114.
秦浩,李双益,赵迪等.基于行为克隆的高通量卫星通信频谱资源分配[J].通信学报,2024,45(05):101-114. DOI: 10.11959/j.issn.1000-436x.2024100.
QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114. DOI: 10.11959/j.issn.1000-436x.2024100.
为应对在高通量多波束卫星系统中,随着波束数量和用户规模的扩大,频谱资源分配问题的维度急剧增加和求解复杂度呈指数级上升这一挑战,提出了一种结合行为克隆与深度强化学习的两阶段算法。第一阶段基于行为克隆,利用已有卫星资源分配决策数据对策略网络进行预训练,通过模仿专家行为减少盲目探索,加快算法收敛。第二阶段基于近端策略优化,进一步优化策略网络,并通过引入卷积注意力模块有效地提取用户业务状态特征,以提升算法整体性能。仿真结果表明,所提算法在收敛速度和算法稳定性方面均优于其他基准算法,并在系统时延、系统平均满意度和频谱效率等性能指标上表现更佳。
In high-throughput multi-beam satellite systems
the dimensionality of the spectrum resource allocation problem increased drastically with the number of satellite beams and service users
which caused an exponential rise in the complexity of the solution. To address the challenge
a two-stage algorithm that combined behavior cloning (BC) with deep reinforcement learning (DRL) was proposed. In the first stage
the strategy network was pretrained using existing decision data from satellite operation through behavior cloning
which mimicked expert behavior to reduce blind exploration and accelerate algorithm convergence. In the second stage
the strategy network was further optimized using the proximal policy optimization (PPO)
and a convolutional block attention module (CBAM) was employed to better extract the user traffic features
thereby enhancing overall algorithm performance. Simulation results demonstrate that the proposed algorithm outperforms the benchmark algorithms in terms of convergence speed and algorithm stability
and also delivers superior performance in system delay
average system satisfaction
and spectrum efficiency.
WANG C X , YOU X H , GAO X Q , et al . On the road to 6G: visions, requirements, key technologies, and testbeds [J ] . IEEE Communications Surveys & Tutorials , 2023 , 25 ( 2 ): 905 - 974 .
JIA M , ZHANG X M , SUN J T , et al . Intelligent resource management for satellite and terrestrial spectrum shared networking toward B5G [J ] . IEEE Wireless Communications , 2020 , 27 ( 1 ): 54 - 61 .
PARIS A , DEL PORTILLO I , CAMERON B , et al . A genetic algorithm for joint power and bandwidth allocation in multibeam satellite systems [C ] // Proceedings of the 2019 IEEE Aerospace Conference . Piscataway : IEEE Press , 2019 : 1 - 15 .
COCCO G , DE COLA T , ANGELONE M , et al . Radio resource management optimization of flexible satellite payloads for DVB-S2 systems [J ] . IEEE Transactions on Broadcasting , 2018 , 64 ( 2 ): 266 - 280 .
何元智 , 彭聪 , 于季弘 , 等 . 面向密集多波束组网的卫星通信系统资源调度算法 [J ] . 通信学报 , 2021 , 42 ( 4 ): 109 - 118 .
HE Y Z , PENG C , YU J H , et al . Resource scheduling algorithm of satellite communication system for future multi-beam dense networking [J ] . Journal on Communications , 2021 , 42 ( 4 ): 109 - 118 .
ZHANG P , WANG X H , MA Z G , et al . Joint optimization of satisfaction index and spectrum efficiency with cache restricted for resource allocation in multi-beam satellite systems [J ] . China Communications , 2019 , 16 ( 2 ): 189 - 201 .
ORTIZ-GOMEZ F G , LEI L , LAGUNAS E , et al . Machine learning for radio resource management in multibeam GEO satellite systems [J ] . Electronics , 2022 , 11 ( 7 ): 992 .
张沛 , 刘帅军 , 马治国 , 等 . 基于深度增强学习和多目标优化改进的卫星资源分配算法 [J ] . 通信学报 , 2020 , 41 ( 6 ): 51 - 60 .
ZHANG P , LIU S J , MA Z G , et al . Improved satellite resource allocation algorithm based on DRL and MOP [J ] . Journal on Communications , 2020 , 41 ( 6 ): 51 - 60 .
LI Z W , XIE Z C , LIANG X W . Dynamic channel reservation strategy based on DQN algorithm for multi-service LEO satellite communication system [J ] . IEEE Wireless Communications Letters , 2021 , 10 ( 4 ): 770 - 774 .
LIU S J , HU X , WANG W D . Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems [J ] . IEEE Access , 2018 , 6 : 15733 - 15742 .
MA S J , HU X , LIAO X L , et al . Deep reinforcement learning for dynamic bandwidth allocation in multi-beam satellite systems [C ] // Proceedings of the 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS) . Piscataway : IEEE Press , 2021 : 955 - 959 .
MENG H W , XIN N , QIN H , et al . A recursive DRL-based resource allocation method for multibeam satellite communication systems [J ] . Chinese Journal of Electronics , 2023 , 33 : 1 - 10 .
GUO W X , TIAN W H , YE Y F , et al . Cloud resource scheduling with deep reinforcement learning and imitation learning [J ] . IEEE Internet of Things Journal , 2021 , 8 ( 5 ): 3576 - 3586 .
TORABI F , WARNELL G , STONE P . Behavioral cloning from observation [J ] . arXiv Preprint , arXiv: 1805.01954 , 2018 .
NAIR A , MCGREW B , ANDRYCHOWICZ M , et al . Overcoming exploration in reinforcement learning with demonstrations [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6292 - 6299 .
COTTATELLUCCI L , DEBBAH M , GALLINARO G , et al . Interference mitigation techniques for broadband satellite systems [C ] // Proceedings of the 24th AIAA International Communications Satellite Systems Conference . Reston : AIAA , 2006 : 1 - 13 .
MARAL G , BOUSQUET M , SUN Z L . Satellite communications systems: systems, techniques and technology [M ] . New York : John Wiley & Sons , 2020 .
SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [J ] . arXiv Preprint , arXiv: 1707.06347 , 2017 .
刘全 , 翟建伟 , 章宗长 , 等 . 深度强化学习综述 [J ] . 计算机学报 , 2018 , 41 ( 1 ): 1 - 27 .
LIU Q , ZHAI J W , ZHANG Z C , et al . A survey on deep reinforcement learning [J ] . Chinese Journal of Computers , 2018 , 41 ( 1 ): 1 - 27 .
NGUYEN T D , HAN Y . A proportional fairness algorithm with QoS provision in downlink OFDMA systems [J ] . IEEE Communications Letters , 2006 , 10 ( 11 ): 760 - 762 .
WOO S , PARK J , LEE J Y , et al . CBAM: convolutional block attention module [C ] // European Conference on Computer Vision . Berlin : Springer , 2018 : 3 - 19 .
WANG L G , FERNANDEZ C , STILLER C . High-level decision making for automated highway driving via behavior cloning [J ] . IEEE Transactions on Intelligent Vehicles , 2023 , 8 ( 1 ): 923 - 935 .
BOJARSKI M , TESTA D D , DWORAKOWSKI D , et al . End to end learning for self-driving cars [J ] . arXiv Preprint , arXiv: 1604.07316 , 2016 .
PRICE B , BOUTILIER C . Accelerating reinforcement learning through implicit imitation [J ] . Journal of Artificial Intelligence Research , 2003 , 19 : 569 - 629 .
LONG P X , FAN T X , LIAO X Y , et al . Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6252 - 6259 .
YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative multi-agent games [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : ACM Press , 2022 : 24611 - 24624 .
ZHANG H J , YANG N , HUANGFU W , et al . Power control based on deep reinforcement learning for spectrum sharing [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 6 ): 4209 - 4219 .
丁伟 , 陶啸 , 叶文熙 , 等 . 高轨道高通量卫星多波束天线技术研究进展 [J ] . 空间电子技术 , 2019 , 16 ( 1 ): 62 - 69 .
DING W , TAO X , YE W X , et al . Advances in research on multi-beam antenna techniques for GEO high throughput satellites [J ] . Space Electronic Technology , 2019 , 16 ( 1 ): 62 - 69 .
0
浏览量
61
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构