浏览全部资源
扫码关注微信
1.西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西 西安 710071
2.深圳大学计算机与软件学院物联网研究中心,广东 深圳 518060
[ "武艳(1986- ),女,内蒙古鄂尔多斯人,博士,西安电子科技大学讲师、硕士生导师,主要研究方向为移动信息物理系统、工业物联网、人工智能等。" ]
[ "潘广川(2000- ),男,重庆人,西安电子科技大学硕士生,主要研究方向为人工智能、移动信息物理系统。" ]
[ "姚明旿(1975- ),男,陕西西安人,博士,西安电子科技大学副教授、博士生导师,主要研究方向为时间敏感网络、确定性网络的测试与探知、高动态无人机自组织网络中的组网与业务质量保证技术等。" ]
[ "杨清海(1976- ),男,山东高密人,博士,西安电子科技大学教授、博士生导师,主要研究方向为人工智能与现代通信、自主通信、信息融合、网络融合、数据分析。" ]
[ "梁中明(1955- ),男,加拿大人,博士,加拿大皇家学会科学院院士、加拿大工程院院士、深圳大学特聘教授,主要研究方向为无线网络、移动系统。" ]
收稿日期:2024-03-18,
修回日期:2024-07-01,
纸质出版日期:2024-08-25
移动端阅览
武艳,潘广川,姚明旿等.基于深度强化学习的空天地一体化网络信息物理系统垂直切换策略[J].通信学报,2024,45(08):180-191.
WU Yan,PAN Guangchuan,YAO Mingwu,et al.Vertical handover policy for cyber-physical systems aided by SAGIN based on deep reinforcement learning[J].Journal on Communications,2024,45(08):180-191.
武艳,潘广川,姚明旿等.基于深度强化学习的空天地一体化网络信息物理系统垂直切换策略[J].通信学报,2024,45(08):180-191. DOI: 10.11959/j.issn.1000-436x.2024140.
WU Yan,PAN Guangchuan,YAO Mingwu,et al.Vertical handover policy for cyber-physical systems aided by SAGIN based on deep reinforcement learning[J].Journal on Communications,2024,45(08):180-191. DOI: 10.11959/j.issn.1000-436x.2024140.
针对空天地一体化网络信息物理系统模型复杂、很难获得网络拓扑先验知识和模型化假设的特点,研究其基于深度强化学习的垂直切换策略。首先,综合考虑系统稳定性、切换开销和网络使用成本约束,将垂直切换策略问题建模为约束马尔可夫决策过程(CMDP),并给出保证可行解存在的充分条件;其次,提出约束-近端策略优化(CPPO)算法解决该问题,并在基站侧引入分布式强化学习机制加速训练收敛。相较于基准策略,仿真验证了所提垂直切换策略的优越性和有效性。
The vertical handover policy of space-air-ground integrated cyber-physical systems based on deep reinforcement learning was studied
in which the challenges of complicated network model and difficulties in acquiring prior knowledge for network topology and model were addressed. By jointly taking the system stability
handover cost and network-using cost into account
the vertical handover policy problem was modeled as a constraint Markov decision process (CMDP)
and a sufficient condition to ensure the existence of a feasible solution was derived.Furthermore
a constraint-proximal policy optimization (CPPO) algorithm was proposed to solve the CMDP
and also the distributed learning scheme at base station sides was introduced to accelerate the speed of converging. Simulation results verify the validation and superiority of the proposed vertical handover policy as compared with the baselines.
陈明 , 梁乃民 . 智能制造之路: 数字化工厂 [M ] . 北京 : 机械工业出版社 , 2022 .
CHEN M , LIANG N M . The road to intelligent manufacturing-digital factory [M ] . Beijing : Machinery Industry Press , 2022 .
ZHANG K W , SHI Y , KARNOUSKOS S , et al . Advancements in industrial cyber-physical systems: an overview and perspectives [J ] . IEEE Transactions on Industrial Informatics , 2023 , 19 ( 1 ): 716 - 729 .
BHATIA V , KUMAWAT S , JAGLAN V . Overview of the role of the Internet of things and cyber-physical systems in various applications [M ] . Boca Raton : Apple Academic Press , 2022 .
HU J , KAUR K , LIN H , et al . Intelligent anomaly detection of trajectories for IoT empowered maritime transportation systems [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 2 ): 2382 - 2391 .
WU W J , YANG F , GAO Y , et al . Distributed handoff problem in heterogeneous networks with end-to-end network slicing: decentralized Markov decision process-based modeling and solution [J ] . IEEE Transactions on Wireless Communications , 2022 , 21 ( 12 ): 11222 - 11236 .
ZHOU H , ZHOU H B , LI J G , et al . Heterogeneous ultradense networks with traffic hotspots: a unified handover analysis [J ] . IEEE Internet of Things Journal , 2023 , 10 ( 10 ): 8825 - 8838 .
LIN Y , ZHANG Z M , HUANG Y M , et al . Heterogeneous user-centric cluster migration improves the connectivity-handover trade-off in vehicular networks [J ] . IEEE Transactions on Vehicular Technology , 2020 , 69 ( 12 ): 16027 - 16043 .
GUO C , GONG C , XU H T , et al . A dynamic handover software-defined transmission control scheme in space-air-ground integrated networks [J ] . IEEE Transactions on Wireless Communications , 2022 , 21 ( 8 ): 6110 - 6124 .
REN Q Q , ABBASI O , KURT G K , et al . Handoff-aware distributed computing in high altitude platform station (HAPS)–assisted vehicular networks [J ] . IEEE Transactions on Wireless Communications , 2023 , 22 ( 12 ): 8814 - 8827 .
LI H , OTA K , DONG M X . AI in SAGIN: building deep learning service-oriented space-air-ground integrated networks [J ] . IEEE Network , 2023 , 37 ( 2 ): 154 - 159 .
ZHANG P Y , WANG C , KUMAR N , et al . Space-air-ground integrated multi-domain network resource orchestration based on virtual network architecture: a DRL method [J ] . IEEE Transactions on Intelligent Transportation Systems , 2022 , 23 ( 3 ): 2798 - 2808 .
ZHOU C H , WU W , HE H L , et al . Deep reinforcement learning for delay-oriented IoT task scheduling in SAGIN [J ] . IEEE Transactions on Wireless Communications , 2021 , 20 ( 2 ): 911 - 925 .
ZHANG S B , LIU A J , HAN C , et al . Multiagent reinforcement learning-based orbital edge offloading in SAGIN supporting Internet of remote things [J ] . IEEE Internet of Things Journal , 2023 , 10 ( 23 ): 20472 - 20483 .
CAI S F , LAU V K N . Modulation-free M2M communications for mission-critical applications [J ] . IEEE Transactions on Signal and Information Processing Over Networks , 2018 , 4 ( 2 ): 248 - 263 .
CAO Y L , LIU Q H , ZUO Y , et al . Receiver-assisted cellular/WiFi handover management for efficient multipath multimedia delivery in heterogeneous wireless networks [J ] . EURASIP Journal on Wireless Communications and Networking , 2016 , 2016 : 1 - 13 .
ANDERSON B D O , MOORE J B . Optimal filtering [M ] . Englewood : Prentice-Hall , 1979 .
SHI L , ZHANG H S . Scheduling two Gauss-Markov systems: an optimal solution for remote state estimation under bandwidth constraint [J ] . IEEE Transactions on Signal Processing , 2012 , 60 ( 4 ): 2038 - 2042 .
XU Y G , HESPANHA J P . Estimation under uncontrolled and controlled communications in networked control systems [C ] // Proceedings of the 44th IEEE Conference on Decision and Control . Piscataway : IEEE Press , 2005 : 842 - 847 .
WANG C , LEI S B , JU P , et al . MDP-based distribution network reconfiguration with renewable distributed generation: approximate dynamic programming approach [J ] . IEEE Transactions on Smart Grid , 2020 , 11 ( 4 ): 3620 - 3631 .
SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [J ] . arXiv Preprint , arXiv: 1707.06347 , 2017 .
TESSLER C , MANKOWITZ D J , MANNOR S . Reward constrained policy optimization [J ] . arXiv Preprint , arXiv: 1805.11074 , 2018 .
BETTSTETTER C , HARTENSTEIN H , PÉREZ-COSTA X . Stochastic properties of the random waypoint mobility model [J ] . Wireless Networks , 2004 , 10 ( 5 ): 555 - 567 .
HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2016 , 30 ( 1 ).
ACHIAM J , HELD D , TAMAR A , et al . Constrained policy optimization [J ] . arXiv Preprint , arXiv: 1705.10528 , 2017 .
HEESS N , TB D , SRIRAM S , et al . Emergence of locomotion behaviours in rich environments [J ] . arXiv Preprint , arXiv: 1707.02286 , 2017 .
WU Y , YANG Q H , LIU X F , et al . Delay-constrained optimal transmission with proactive spectrum handoff in cognitive radio networks [J ] . IEEE Transactions on Communications , 2016 , 64 ( 7 ): 2767 - 2779 .
WU Y Q , HU F , KUMAR S , et al . A learning-based QoE-driven spectrum handoff scheme for multimedia transmissions over cognitive radio networks [J ] . IEEE Journal on Selected Areas in Communications , 2014 , 32 ( 11 ): 2134 - 2148 .
0
浏览量
36
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构