Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning

Wenjun XU; Silei WU; Fengyu WANG; Lan LIN; Guojun LI; Zhi ZHANG

doi:10.11959/j.issn.1000-436x.2022131

您当前的位置：

首页 >

文章列表页 >

Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning

Papers | 更新时间：2024-06-05

- Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning
- Journal on Communications Vol. 43, Issue 8, Pages: 1-16(2022)
- 作者机构：
  
  1. 北京邮电大学人工智能学院，北京 100876
  2. 重庆邮电大学超视距可信信息传输研究所，重庆 400065
  3. 北京邮电大学信息与通信工程学院，北京 100876
- 作者简介：
- 基金信息：
- DOI：10.11959/j.issn.1000-436x.2022131
  CLC： TN929.5
- Online First：2022-08，
  
  Published：25 August 2022
- 稿件说明：
移动端阅览
Wenjun XU, Silei WU, Fengyu WANG, et al. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16.
DOI：

Wenjun XU, Silei WU, Fengyu WANG, et al. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16. DOI： 10.11959/j.issn.1000-436x.2022131.

摘要

为了快速恢复大规模受灾用户的应急通信服务，针对接入用户数量众多导致的业务差异性和动态性显著、集中式算法难以扩展等问题，提出了一种基于多智能体强化学习的分布式智简覆盖优化架构。在网络特征层中，设计了考虑用户业务差异性的分布式 k-sums 分簇算法，每个无人机基站从用户需求出发，原生简约地调整局部网络结构，并筛选簇中心用户特征作为多智能体强化学习神经网络的输入状态。在轨迹调控层中，设计了多智能体最大熵强化学习（MASAC）算法，无人机基站作为智能节点以“分布式训练-分布式执行”的框架调控自身飞行轨迹，并融合集成学习和课程学习技术提升了训练稳定性和收敛速度。仿真结果表明，所提分布式 k-sums 分簇算法在平均负载效率和分簇均衡性方面优于k-means算法，基于MASAC的无人机基站轨迹调控算法能够有效减小通信中断的发生频率、提升网络的频谱效率，效果优于现有的强化学习方法。

Abstract

In order to quickly restore emergency communication services for large-scale post-disaster users

a distributed intellicise coverage optimization architecture based on multi-agent reinforcement learning (RL) was proposed

which could address the significant differences and dynamics of communication services caused by a large number of access users

and the difficulty of expansion caused by centralized algorithms.Specifically

a distributed k-sums clustering algorithm considering service differences of users was designed in the network characterization layer

which could make each unmanned aerial vehicle base station (UAV-BS) adjust the local networking natively and simply

and obtain states of cluster center for multi-agent RL.In the trajectory control layer

multi-agent soft actor critic (MASAC) with distributed-training-distributed-execution structure was designed for UAV-BS to control trajectory as intelligent nodes.Furthermore

ensemble learning and curriculum learning were integrated to improve the stability and convergence speed of training process.The simulation results show that the proposed distributed k-sums algorithm is superior to the k-means in terms of average load efficiency and clustering balance

and MASAC based trajectory control algorithm can effectively reduce communication interruptions and improve the spectrum efficiency

which outperforms the existing RL algorithms.

关键词

Keywords

references

DEEPAK G C , LADAS A , SAMBO Y A , et al . An overview of post-disaster emergency communication systems in the future networks [J ] . IEEE Wireless Communications , 2019 , 26 ( 6 ): 132 - 139 .

GUO H Z , LI J Y , LIU J J , et al . A survey on space-air-ground-sea integrated network security in 6G [J ] . IEEE Communications Surveys＆ Tutorials , 2022 , 24 ( 1 ): 53 - 87 .

ZHOU Y Q , LIU L , WANG L , et al . Service-aware 6G:an intelligent and open network based on the convergence of communication,computing and caching [J ] . Digital Communications and Networks , 2020 , 6 ( 3 ): 253 - 260 .

ZHANG P , XU W J , GAO H , et al . Toward wisdom-evolutionary and primitive-concise 6G:a new paradigm of semantic communication networks [J ] . Engineering , 2022 , 8 : 60 - 73 .

张平 , 许晓东 , 韩书君 , 等 . 智简无线网络赋能行业应用 [J ] . 北京邮电大学学报 , 2020 , 43 ( 6 ): 1 - 9 .

ZHANG P , XU X D , HAN S J , et al . Entropy reduced mobile networks empowering industrial applications [J ] . Journal of Beijing University of Posts and Telecommunications , 2020 , 43 ( 6 ): 1 - 9 .

ZHOU Y Q , TIAN L , LIU L , et al . Fog computing enabled future mobile communication networks:a convergence of communication and computing [J ] . IEEE Communications Magazine , 2019 , 57 ( 5 ): 20 - 27 .

KANG Z Y , YOU C S , ZHANG R . 3D placement for multi-UAV relaying:an iterative Gibbs-sampling and block coordinate descent optimization approach [J ] . IEEE Transactions on Communications , 2021 , 69 ( 3 ): 2047 - 2062 .

YIN S X , LI L H , YU F R . Resource allocation and basestation placement in downlink cellular networks assisted by multiple wireless powered UAVs [J ] . IEEE Transactions on Vehicular Technology , 2020 , 69 ( 2 ): 2171 - 2184 .

ZHANG Y X , CHENG W C . Trajectory and power optimization for multi-UAV enabled emergency wireless communications networks [C ] // Proceedings of International Conference on Communications Workshops . Piscataway:IEEE Press , 2019 : 1 - 6 .

LI X , WANG Q , LIU J , et al . Trajectory design and generalization for UAV enabled networks:a deep reinforcement learning approach [C ] // Proceedings of Wireless Communications and Networking Conference . Piscataway:IEEE Press , 2020 : 1 - 6 .

LIU X , LIU Y W , CHEN Y . Reinforcement learning in multiple-UAV networks:deployment and movement design [J ] . IEEE Transactions on Vehicular Technology , 2019 , 68 ( 8 ): 8036 - 8049 .

CHALLITA U , SAAD W , BETTSTETTER C . Interference management for cellular-connected UAVs:a deep reinforcement learning approach [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 4 ): 2125 - 2140 .

ZHAO N , LIU Z H , CHENG Y Q . Multi-agent deep reinforcement learning for trajectory design and power allocation in multi-UAV networks [J ] . IEEE Access , 8 : 139670 - 139679 .

QIN Z Q , LIU Z H , HAN G J , et al . Distributed UAV-BSs trajectory optimization for user-level fair communication service with multi-agent deep reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2021 , 70 ( 12 ): 12290 - 12301 .

LOWE R , WU Y , TAMAR A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [J ] . arXiv Preprint,arXiv:1706.02275 , 2017 .

HAARNOJA T , TANG H R , ABBEEL P , et al . Reinforcement learning with deep energy-based policies [J ] . arXiv Preprint,arXiv:1702.08165 , 2017 .

NAVARRO-ORTIZ J , ROMERO-DIAZ P , SENDRA S , et al . A survey on 5G usage scenarios and traffic models [J ] . IEEE Communications Surveys ＆ Tutorials , 2020 , 22 ( 2 ): 905 - 929 .

3GPP . Technical specification group (TSG) RAN WG4; RF system scenarios:TR 25.942 v2.1.3 [S ] . 2000 .

WANG L T , SUN L T , TOMIZUKA M , et al . Socially-compatible behavior design of autonomous vehicles with verification on real human data [J ] . IEEE Robotics and Automation Letters , 2021 , 6 ( 2 ): 3421 - 3428 .

PEI S , NIE F , WANG R , et al . Efficient clustering based on a unified view of k-means and ratio-cut [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 14855 - 14866 .

HAARNOJA T , ZHOU A , ABBEEL P , et al . Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor [C ] // International Conference on Machine Learning . New York:PMLR , 2018 : 1861 - 1870 .

HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Palo Alto:AAAI Press , 2016 : 2094 - 2100 .

DONG X B , YU Z W , CAO W M , et al . A survey on ensemble learning [J ] . Frontiers of Computer Science , 2020 , 14 ( 2 ): 241 - 258 .

NARVEKAR S , PENG B , LEONETTI M , et al . Curriculum learning for reinforcement learning domains:a framework and survey [J ] . arXiv Preprint,arXiv:2003.04960 , 2020 .

SUTTON R S , BARTO A G . Reinforcement learning:an introduction [M ] . Massachusetts : MIT Press , 1998 .

Views

1229

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Multicast time-expanded graph-based collective communication scheduling algorithm

Distributed parallel training technology for large-scale model with heterogeneous computing resources

Cellular-free RAN slicing resource allocation algorithm based on multi-timescale collaboration

Rapid deployment method of integrated localization and communication network for high dynamic emergency scenarios

Multi-agent caching distribution strategy for content freshness guarantee in IoV

Related Author

Li Hongyan

Li Ye

Ma Han

Wang Peng

Liu Fei

BAN Yourong

WANG Sheng

HUANG Lei

Related Institution

Singapore University of Technology and Design

State Key Laboratory of Integrated Services Networks, Xidian University

Infinigence AI

China Mobile Research Institute

State Grid Suzhou Power Supply Company

AI问答

⁰