基于强化学习的多层卫星网络边缘安全决策方法

左珮良; 侯少龙; 郭超; 蒋华; 王文博

doi:10.11959/j.issn.1000-436x.2022111

您当前的位置：

首页 >

文章列表页 >

基于强化学习的多层卫星网络边缘安全决策方法

学术论文 | 更新时间：2024-06-06

- 基于强化学习的多层卫星网络边缘安全决策方法
- Security decision method for the edge of multi-layer satellite network based on reinforcement learning
- 通信学报 2022年43卷第6期页码：189-199
- 作者机构：
  
  1. 北京电子科技学院电子与通信工程系，北京 100070
  2. 西安电子科技大学通信工程学院，陕西西安 710068
  3. 北京邮电大学信息与通信工程学院，北京 100876
- 作者简介：
  
  [ "左珮良（1991- ），男，山东烟台人，博士，北京电子科技学院讲师，主要研究方向为卫星通信、认知无线电、物联网、信息安全、软件定义网络" ]
  [ "侯少龙（1999- ），男，山西原平人，西安电子科技大学硕士生，主要研究方向为卫星通信、人工智能" ]
  [ "郭超（1987- ），女，江西九江人，北京电子科技学院讲师，主要研究方向为卫星通信、应急通信、传输控制、网络负载均衡、信息安全、物联网" ]
  [ "蒋华（1962- ），男，山西大同人，北京电子科技学院教授，主要研究方向为通信安全、应急通信、物联网、下一代网络" ]
  [ "王文博（1965- ），男，河北安国人，博士，北京邮电大学教授，主要研究方向为无线通信、3G/4G/5G/6G通信、卫星通信、认知无线电、物联网、信息安全、软件定义网络" ]
- 基金信息：
  
  国家自然科学基金资助项目(62001251);国家自然科学基金资助项目(62001252);北京高校“高精尖”学科建设基金资助项目(202100130401);西安电子科技大学综合业务网理论及关键技术国家重点实验室基金资助项目(ISN22-13)
- DOI：10.11959/j.issn.1000-436x.2022111
  中图分类号： TN92
- 网络出版日期：2022-06，
  
  纸质出版日期：2022-06-25
- 稿件说明：
移动端阅览
左珮良, 侯少龙, 郭超, 等. 基于强化学习的多层卫星网络边缘安全决策方法[J]. 通信学报, 2022,43(6):189-199.

Peiliang ZUO, Shaolong HOU, Chao GUO, et al. Security decision method for the edge of multi-layer satellite network based on reinforcement learning[J]. Journal on communications, 2022, 43(6): 189-199.
左珮良, 侯少龙, 郭超, 等. 基于强化学习的多层卫星网络边缘安全决策方法[J]. 通信学报, 2022,43(6):189-199. DOI： 10.11959/j.issn.1000-436x.2022111.

Peiliang ZUO, Shaolong HOU, Chao GUO, et al. Security decision method for the edge of multi-layer satellite network based on reinforcement learning[J]. Journal on communications, 2022, 43(6): 189-199. DOI： 10.11959/j.issn.1000-436x.2022111.

摘要

目的：多层卫星网络是空天地一体化技术的重要组成，本文旨在依靠卫星节点的自主判决能力，发挥网络边缘场景中针对感知数据包含加解密和压缩在内的处理以及回传方面的任务协作能力。以确保数据安全为前提，以低传输时延为目标，实现任务卫星在多层卫星网络架构中的边缘决策。

方法：本文考虑了由低轨卫星、中轨卫星以及高轨地球同步轨道卫星组成的多层卫星网络。其中，低轨卫星节点负责观测侦察业务（如气象观测、地理侦测、情报侦察等），中轨卫星视为边缘场景中的雾节点，并由其中一颗担任雾运算处理中心，负责规划观测数据的压缩处理和安全加密所在卫星节点以及数据回传的网络选择，地球同步轨道卫星则具备最大的覆盖范围和最强的运算处理能力。本文使用深度强化学习算法实现卫星网络的边缘安全决策。具体来说，边缘中心节点通过感知系统获得卫星网络的环境状态，在此基础上利用深度强化学习算法自主学习的能力，拟合得到场景下最优的数据卸载策略，获得最优的链路规划，使得星上资源得到充分利用，从而达到众多观测任务的平均回传时延达最小的目的。首先，边缘中心节点对环境进行观察，获取环境中观测卫星任务数据量大小、信道条件、边缘节点处理能力等状态要素，在此基础上通过深度Q网络完成状态到动作的映射，实现初步的策略选择；策略作用于卫星网络，会改变环境的状态，同时环境对策略作出评价，以奖励的形式反馈给边缘中心节点；边缘中心节点基于新的环境状态和收益，进行误差计算和Q值的更新，以此来优化动作选择策略，从而获得更高的奖励收益以及新的环境状态；上述过程不断迭代最终获得最优策略。

结果：采用Keras作为仿真平台，并在仿真实验中，假定低轨卫星的星座为常见的Walker星座。以多层卫星网络中的某一区域作为仿真对象，设定该区域低轨观测卫星数量为8颗，中轨卫星数量为3颗，高轨卫星数量为一颗。本文的仿真结果包含三个方面：1）对不同卫星数量情况下各方法针对随机快照的收敛性能进行仿真。仿真结果表明，所提方法针对不同卫星数量的情况均表现出了收敛趋势，随着卫星数量的增加，所提方法达到收敛所需要的训练次数明显增加，这是由于卫星数量的增加大幅提升了方法动作空间的大小；2）对所提方法在不同网络构型条件下的性能进行了对比。仿真结果表明，所提方法在所有4种不同构型条件下均具有最好的收敛性能，然而在部分快照下，低-高网络构型的起始性能非常优异，但随着训练的进行，其收敛性能变得较差，这是由于该网络构型的链路选择较少，这限制了其性能；3）采用测试集对所提方法与对比方法的性能进行仿真验证。仿真结果表明，相较于随机边缘安全决策和由信噪比参数为导向的边缘安全决策，本文所提方法在时延性能上具有较大的优势，且与遍历得到的最优边缘安全决策性能相差较小。

结论：本文针对场景中为低轨观测卫星进行多层卫星节点的链路选择问题，提出一种基于深度强化学习的数据压缩与加密回传决策方法。通过结合场景合理地设计方法的状态、动作、奖励以及训练网络等相关参数，所提方法能够以低传输时延为目标进行智能高效的边缘决策。

Abstract

Objectives: Multi-layer satellite network is an important component of space-ground integration technology.The purpose of this paper is to rely on the autonomous decision ability of satellite nodes to give full play to the processing and backhaul tasks of sensing data including encryption

decryption and compression in network edge scenarios. Collaboration. With the premise of ensuring data security and the goal of low transmission delay

the edge decision-making of mission satellites in the multi-layer satellite network architecture is realized.

Methods:This paper considers a multi-layer satellite network consisting of low-orbit satellites

medium-orbit satellites

and high-orbit geosynchronous satellites.Among them

the low-orbit satellite nodes are responsible for observation and reconnaissance services (such as meteorological observation

geographic detection

intelligence reconnaissance

etc.)

and the medium-orbit satellites are regarded as fog nodes in edge scenarios

and one of them serves as the fog computing processing center

responsible for planning and observing The data compression processing and security encryption are located in the satellite node and the network selection of the data backhaul. The geosynchronous orbit satellite has the largest coverage and the strongest computing processing capability. This paper uses deep reinforcement learning algorithms to implement edge security decisions for satellite networks. Specifically

the edge center node obtains the environmental state of the satellite network through the perception system

and on this basis

uses the ability of deep reinforcement learning algorithm to learn independently

and obtains the optimal data offloading strategy in the scene by fitting

and obtains the optimal link planning.

so that the onboard resources can be fully utilized

so as to achieve the goal of minimizing the average return delay of many observation tasks.First

the edge center node observes the environment and obtains state elements such as the data volume

channel conditions

and edge node processing capability of the observation satellite mission in the environment. Selection;the strategy acts on the satellite network

which will change the state of the environment

and the environment will evaluate the strategy and feed it back to the edge center node in the form of reward;the edge center node will perform error calculation and update the Q value based on the new environment state and income

in order to optimize the action selection strategy

so as to obtain higher rewards and new environmental states; the above process is continuously iterated to finally obtain the optimal strategy.

Results:Keras is used as the simulation platform

and in the simulation experiment

the constellation of low-orbit satellites is assumed to be the common Walker constellation. Taking a certain area in the multi-layer satellite network as the simulation object

the number of low-orbit observation satellites in this area is set to 8

the number of medium-orbit satellites is 3

and the number of high-orbit satellites is one. The simulation results include three aspects:1)Simulation of the convergence performance of each method for random snapshots with different numbers of satellites. The simulation results show that the proposed method shows a convergence trend for different numbers of satellites. With the increase of the number of satellites

the number of training times required for the proposed method to achieve convergence increases significantly. This is because the increase in the number of satellites increases significantly.The size of the action space of the method;2)The performance of the proposed method under different network configuration conditions is compared. Simulation results show that the proposed method has the best convergence performance under all 4 different configuration conditions

however

the initial performance of the low-high network configuration is excellent under partial snapshots

but as the training progresses

Its convergence performance becomes poor

because the network configuration has fewer link choices

which limits its performance; 3) The performance of the proposed method and the comparison method is simulated and verified by using the test set. The simulation results show that compared with the random edge security decision and the edge security decision oriented by the signal-to-noise ratio parameter

the method proposed has a greater advantage in the delay performance

and is comparable to the optimal edge security decision performance obtained by traversal.The difference is small.

Conclusions:Aiming at the link selection problem of multi-layer satellite nodes for low-orbit observation satellites in the scene

this paper proposes a data compression and encryption backhaul decision method based on deep reinforcement learning. By rationally designing the state

action

reward

and training network related parameters of the method in combination with the scene

the proposed method can make intelligent and efficient edge decision-making with the goal of low transmission delay.

关键词

Keywords

references

王丽娜 , 王兵 , 周贤伟 , 等 . 卫星通信系统 [M ] . 北京 : 国防工业出版社 , 2006 .

WANG L N , WANG B , ZHOU X W , et al . Satellite communication system [M ] . Beijing : National Defense Industry Press , 2006 .

YOU X H , WANG C X , HUANG J , et al . Towards 6G wireless communication networks:vision,enabling technologies,and new paradigm shifts [J ] . Science China Information Sciences , 2020 , 64 ( 1 ): 1 - 74 .

TATARIA H , SHAFI M , MOLISCH A F , et al . 6G wireless systems:vision,requirements,challenges,insights,and opportunities [J ] . Proceedings of the IEEE , 2021 , 109 ( 7 ): 1166 - 1199 .

ZUO P L , WANG C , YAO Z , et al . An intelligent routing algorithm for LEO satellites based on deep reinforcement learning [C ] // Proceedings of 2021 IEEE 94th Vehicular Technology Conference . Piscataway:IEEE Press , 2021 : 1 - 5 .

DI B Y , ZHANG H L , SONG L Y , et al . Ultra-dense LEO:integrating terrestrial-satellite networks into 5G and beyond for data offloading [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 1 ): 47 - 62 .

夏士超 , 姚枝秀 , 鲜永菊 , 等 . 移动边缘计算中分布式异构任务卸载算法 [J ] . 电子与信息学报 , 2020 , 42 ( 12 ): 2891 - 2898 .

XIA S C , YAO Z X , XIAN Y J , et al . A distributed heterogeneous task offloading methodology for mobile edge computing [J ] . Journal of Electronics ＆ Information Technology , 2020 , 42 ( 12 ): 2891 - 2898 .

钟磊 . 低轨星座通信网络边缘计算架构研究 [D ] . 成都:电子科技大学 , 2020 .

ZHONG L . Research on edge computing architecture of LEO constellation communication network [D ] . Chengdu:University of Electronic Science and Technology of China , 2020 .

王元君 . 星地混合网络中的计算资源分配和负载均衡 [D ] . 北京:北京邮电大学 , 2020 .

WANG Y J . Computing resource allocation and load balancing in hybrid satellite-terrestrial network [D ] . Beijing:Beijing University of Posts and Telecommunications , 2020 .

DING C F , WANG J B , ZHANG H , et al . Joint optimization of transmission and computation resources for satellite and high altitude platform assisted edge computing [J ] . IEEE Transactions on Wireless Communications , 2022 , 21 ( 2 ): 1362 - 1377 .

ZHOU D , SHENG M , WANG Y X , et al . Machine learning-based resource allocation in satellite networks supporting Internet of remote things [J ] . IEEE Transactions on Wireless Communications , 2021 , 20 ( 10 ): 6606 - 6621 .

JIANG C X , ZHU X M . Reinforcement learning based capacity management in multi-layer satellite networks [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 7 ): 4685 - 4699 .

闵士权 , 刘光明 , 陈兵 , 等 . 天地一体化信息网络 [M ] . 北京 : 电子工业出版社 , 2020 .

MIN S Q , LIU G M , CHEN B , et al . Space-ground integrated information network [M ] . Beijing : Electronic Industry Press , 2020 .

黄娟 . 基于MATLAB/STK的卫星通信场景仿真设计与实现 [D ] . 合肥:安徽大学 , 2016 .

HUANG J . The design and implementation of simulation for satellite communication scene based on MATLAB/STK [D ] . Hefei:Anhui University , 2016 .

GU B , ZHANG X , LIN Z Q , et al . Deep multiagent reinforcement-learning-based resource allocation for Internet of controllable things [J ] . IEEE Internet of Things Journal , 2021 , 8 ( 5 ): 3066 - 3074 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Playing atari with deep reinforcement learning [J ] . arXiv Preprint,arXiv:1312.5602 , 2013 .

HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning [C ] // Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence . Palo Alto:AAAI Press , 2016 : 2094 - 2100 .

YU Y D , WANG T T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks [C ] // Proceedings of IEEE Journal on Selected Areas in Communications . Piscataway:IEEE Press , 2019 : 1277 - 1290 .

浏览量

647

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于软提示微调和强化学习的网络安全命名实体识别方法研究

基于审计博弈的安全协作频谱感知方案

基于强化学习的在线离线混部云环境下的调度框架

基于深度强化学习的微服务多维动态防御策略研究

面向智能渗透攻击的欺骗防御方法