QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114.
QIN Hao,LI Shuangyi,ZHAO Di,et al.Spectrum resource allocation for high-throughput satellite communications based on behavior cloning[J].Journal on Communications,2024,45(05):101-114. DOI: 10.11959/j.issn.1000-436x.2024100.
Spectrum resource allocation for high-throughput satellite communications based on behavior cloning
the dimensionality of the spectrum resource allocation problem increased drastically with the number of satellite beams and service users
which caused an exponential rise in the complexity of the solution. To address the challenge
a two-stage algorithm that combined behavior cloning (BC) with deep reinforcement learning (DRL) was proposed. In the first stage
the strategy network was pretrained using existing decision data from satellite operation through behavior cloning
which mimicked expert behavior to reduce blind exploration and accelerate algorithm convergence. In the second stage
the strategy network was further optimized using the proximal policy optimization (PPO)
and a convolutional block attention module (CBAM) was employed to better extract the user traffic features
thereby enhancing overall algorithm performance. Simulation results demonstrate that the proposed algorithm outperforms the benchmark algorithms in terms of convergence speed and algorithm stability
and also delivers superior performance in system delay
average system satisfaction
and spectrum efficiency.
关键词
Keywords
references
WANG C X , YOU X H , GAO X Q , et al . On the road to 6G: visions, requirements, key technologies, and testbeds [J ] . IEEE Communications Surveys & Tutorials , 2023 , 25 ( 2 ): 905 - 974 .
JIA M , ZHANG X M , SUN J T , et al . Intelligent resource management for satellite and terrestrial spectrum shared networking toward B5G [J ] . IEEE Wireless Communications , 2020 , 27 ( 1 ): 54 - 61 .
PARIS A , DEL PORTILLO I , CAMERON B , et al . A genetic algorithm for joint power and bandwidth allocation in multibeam satellite systems [C ] // Proceedings of the 2019 IEEE Aerospace Conference . Piscataway : IEEE Press , 2019 : 1 - 15 .
COCCO G , DE COLA T , ANGELONE M , et al . Radio resource management optimization of flexible satellite payloads for DVB-S2 systems [J ] . IEEE Transactions on Broadcasting , 2018 , 64 ( 2 ): 266 - 280 .
HE Y Z , PENG C , YU J H , et al . Resource scheduling algorithm of satellite communication system for future multi-beam dense networking [J ] . Journal on Communications , 2021 , 42 ( 4 ): 109 - 118 .
ZHANG P , WANG X H , MA Z G , et al . Joint optimization of satisfaction index and spectrum efficiency with cache restricted for resource allocation in multi-beam satellite systems [J ] . China Communications , 2019 , 16 ( 2 ): 189 - 201 .
ORTIZ-GOMEZ F G , LEI L , LAGUNAS E , et al . Machine learning for radio resource management in multibeam GEO satellite systems [J ] . Electronics , 2022 , 11 ( 7 ): 992 .
ZHANG P , LIU S J , MA Z G , et al . Improved satellite resource allocation algorithm based on DRL and MOP [J ] . Journal on Communications , 2020 , 41 ( 6 ): 51 - 60 .
LI Z W , XIE Z C , LIANG X W . Dynamic channel reservation strategy based on DQN algorithm for multi-service LEO satellite communication system [J ] . IEEE Wireless Communications Letters , 2021 , 10 ( 4 ): 770 - 774 .
LIU S J , HU X , WANG W D . Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems [J ] . IEEE Access , 2018 , 6 : 15733 - 15742 .
MA S J , HU X , LIAO X L , et al . Deep reinforcement learning for dynamic bandwidth allocation in multi-beam satellite systems [C ] // Proceedings of the 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS) . Piscataway : IEEE Press , 2021 : 955 - 959 .
MENG H W , XIN N , QIN H , et al . A recursive DRL-based resource allocation method for multibeam satellite communication systems [J ] . Chinese Journal of Electronics , 2023 , 33 : 1 - 10 .
GUO W X , TIAN W H , YE Y F , et al . Cloud resource scheduling with deep reinforcement learning and imitation learning [J ] . IEEE Internet of Things Journal , 2021 , 8 ( 5 ): 3576 - 3586 .
TORABI F , WARNELL G , STONE P . Behavioral cloning from observation [J ] . arXiv Preprint , arXiv: 1805.01954 , 2018 .
NAIR A , MCGREW B , ANDRYCHOWICZ M , et al . Overcoming exploration in reinforcement learning with demonstrations [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6292 - 6299 .
COTTATELLUCCI L , DEBBAH M , GALLINARO G , et al . Interference mitigation techniques for broadband satellite systems [C ] // Proceedings of the 24th AIAA International Communications Satellite Systems Conference . Reston : AIAA , 2006 : 1 - 13 .
MARAL G , BOUSQUET M , SUN Z L . Satellite communications systems: systems, techniques and technology [M ] . New York : John Wiley & Sons , 2020 .
SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [J ] . arXiv Preprint , arXiv: 1707.06347 , 2017 .
LIU Q , ZHAI J W , ZHANG Z C , et al . A survey on deep reinforcement learning [J ] . Chinese Journal of Computers , 2018 , 41 ( 1 ): 1 - 27 .
NGUYEN T D , HAN Y . A proportional fairness algorithm with QoS provision in downlink OFDMA systems [J ] . IEEE Communications Letters , 2006 , 10 ( 11 ): 760 - 762 .
WOO S , PARK J , LEE J Y , et al . CBAM: convolutional block attention module [C ] // European Conference on Computer Vision . Berlin : Springer , 2018 : 3 - 19 .
WANG L G , FERNANDEZ C , STILLER C . High-level decision making for automated highway driving via behavior cloning [J ] . IEEE Transactions on Intelligent Vehicles , 2023 , 8 ( 1 ): 923 - 935 .
BOJARSKI M , TESTA D D , DWORAKOWSKI D , et al . End to end learning for self-driving cars [J ] . arXiv Preprint , arXiv: 1604.07316 , 2016 .
PRICE B , BOUTILIER C . Accelerating reinforcement learning through implicit imitation [J ] . Journal of Artificial Intelligence Research , 2003 , 19 : 569 - 629 .
LONG P X , FAN T X , LIAO X Y , et al . Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning [C ] // Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2018 : 6252 - 6259 .
YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative multi-agent games [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : ACM Press , 2022 : 24611 - 24624 .
ZHANG H J , YANG N , HUANGFU W , et al . Power control based on deep reinforcement learning for spectrum sharing [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 6 ): 4209 - 4219 .
DING W , TAO X , YE W X , et al . Advances in research on multi-beam antenna techniques for GEO high throughput satellites [J ] . Space Electronic Technology , 2019 , 16 ( 1 ): 62 - 69 .