浏览全部资源
扫码关注微信
1. 清华大学电子工程系,北京 100084
2. 西安交通大学信息与通信工程学院,陕西 西安 710049
[ "秦志金(1989- ),女,山西太原人,博士,清华大学副教授、博士生导师,主要研究方向为语义通信等" ]
[ "赵菼菼(1991- ),女,甘肃陇南人,西安交通大学博士生,主要研究方向为无线安全传输、移动边缘计算、深度强化学习、联邦学习等" ]
[ "李凡(1981- ),男,陕西宝鸡人,博士,西安交通大学教授、博士生导师,主要研究方向为基于深度学习的图像视频编码、基于机器学习的图像视频质量评价、图像视频的深度理解和处理等" ]
[ "陶晓明(1981- ),女,河北石家庄人,博士,清华大学教授、博士生导师,主要研究方向为无线多媒体通信理论及关键技术应用等" ]
网络出版日期:2023-05,
纸质出版日期:2023-05-25
移动端阅览
秦志金, 赵菼菼, 李凡, 等. 多模态语义通信研究综述[J]. 通信学报, 2023,44(5):28-41.
Zhijin QIN, Tantan ZHAO, Fan LI, et al. Survey of research on multimodal semantic communication[J]. Journal on communications, 2023, 44(5): 28-41.
秦志金, 赵菼菼, 李凡, 等. 多模态语义通信研究综述[J]. 通信学报, 2023,44(5):28-41. DOI: 10.11959/j.issn.1000-436x.2023105.
Zhijin QIN, Tantan ZHAO, Fan LI, et al. Survey of research on multimodal semantic communication[J]. Journal on communications, 2023, 44(5): 28-41. DOI: 10.11959/j.issn.1000-436x.2023105.
随着人工智能与通信的交叉融合,文本、图像、音频、视频等多模态数据处理技术蓬勃发展,模态语义的共享维度被深度挖掘,多模态语义信息的高度抽象、智能简约等特性被充分利用,为语义通信带来了全新的思路和手段。首先,介绍了语义通信的基础理论和分类,分别针对文本、图像、音频、视频综述了单模态语义通信的研究现状;然后,综述了多模态语义通信的研究现状,介绍了多模态数据融合技术和安全语义通信的研究;最后,总结了多模态语义通信面临的挑战。
With the cross-integration of artificial intelligence and communications
technologies for processing multimodal data such as text
image
audio
and video are booming
the shared dimension of modal semantics is deeply excavated
and the characteristics of multimodal semantic information such as high abstraction
intelligence and simplicity are being fully utilized
which brings new ideas and means to semantic communications.First
the fundamental theories and classifications of semantic communication were introduced
and the research status of single-modal semantic communication was reviewed for text
image
audio
and video respectively.Then
the research status of multimodal semantic communication was reviewed
and multimodal data fusion technology and secure semantic communication were introduced.Finally
the challenges faced by multimodal semantic communication were summarized.
QIN Z , TAO X , LU J , et al . Semantic communications:principles and challenges [J ] . arXiv Preprint,arXiv:2201.01389 , 2022 .
刘传宏 , 郭彩丽 , 杨洋 , 等 . 人工智能物联网中面向智能任务的语义通信方法 [J ] . 通信学报 , 2021 , 42 ( 11 ): 97 - 108 .
LIU C H , GUO C L , YANG Y , et al . Intelligent task-oriented semantic communication method in artificial intelligence of things [J ] . Journal on Communications , 2021 , 42 ( 11 ): 97 - 108 .
LI A , WEI X , WU D , et al . Cross-modal semantic communications [J ] . IEEE Wireless Communications , 2022 , 29 ( 6 ): 144 - 151 .
ZHONG Y X . A theory of semantic information [J ] . China Communications , 2017 , 14 ( 1 ): 1 - 17 .
MORRIS C W . Foundations of the theory of signs [M ] . Chicago : University of Chicago Press , 1938 .
SHANNON C E , WEAVER W . The mathematical theory of communication [M ] . Urbana : University of Illinois Press , 1998 .
ZHANG P , XU W , GAO H , et al . Toward wisdom-evolutionary and primitive-concise 6G:a new paradigm of semantic communication networks [J ] . Engineering , 2022 , 8 : 60 - 73 .
CARNAP R , BAR-HILLEL Y . An outline of a theory of semantic information [J ] . The Journal of Symbolic Logic , 1954 , 19 ( 3 ): 230 - 232 .
BAO J , BASU P , DEAN M K , et al . Towards a theory of semantic communication [C ] // Proceedings of 2011 IEEE Network Science Workshop . Piscataway:IEEE Press , 2011 : 110 - 117 .
刘传宏 , 郭彩丽 , 杨洋 , 等 . 面向智能任务的语义通信:理论、技术和挑战 [J ] . 通信学报 , 2022 , 43 ( 6 ): 41 - 57 .
LIU C H , GUO C L , YANG Y , et al . Intelligent task-oriented semantic communications:theory,technology and challenges [J ] . Journal on Communications , 2022 , 43 ( 6 ): 41 - 57 .
SHAO J W , MAO Y Y , ZHANG J . Learning task-oriented communication for edge inference:an information bottleneck approach [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 1 ): 197 - 211 .
张海君 , 陈安琪 , 李亚博 , 等 . 6G移动网络关键技术 [J ] . 通信学报 , 2022 , 43 ( 7 ): 189 - 202 .
ZHANG H J , CHEN A Q , LI Y B , et al . Key technologies of 6G mobile network [J ] . Journal on Communications , 2022 , 43 ( 7 ): 189 - 202 .
CALVANESE S E , BARBAROSSA S . 6G networks:beyond Shannon towards semantic and goal-oriented communications [J ] . Computer Networks , 2021 ,190:107930.
SHI G , GAO D , SONG X , et al . A new communication paradigm:from bit accuracy to semantic fidelity [J ] . arXiv Preprint,arXiv:2101.12649 , 2021 .
TONG H N , YANG Z H , WANG S H , et al . Federated learning for audio semantic communication [J ] . Frontiers in Communications and Networks , 2021 ,2:734402.
WENG Z Z , QIN Z J , LI G Y . Semantic communications for speech signals [C ] // Proceedings of 2021 IEEE International Conference on Communications . Piscataway:IEEE Press , 2021 : 1 - 6 .
JIANG P , WEN C K , JIN S , et al . Wireless semantic communications for video conferencing [J ] . arXiv Preprint,arXiv:2204.07790 , 2022 .
FARSAD N , RAO M , GOLDSMITH A . Deep learning for joint source-channel coding of text [C ] // Proceedings of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2018 : 2326 - 2330 .
PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg:Association for Computational Linguistics , 2014 : 1532 - 1543 .
BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate [J ] . arXiv Preprint,arXiv:1409.0473 , 2014 .
WU Y , SCHUSTER M , CHEN Z , et al . Google’s neural machine translation system:bridging the gap between human and machine translation [J ] . arXiv Preprint,arXiv:1609.08144 , 2016 .
GRAVES A . Sequence transduction with recurrent neural networks [J ] . arXiv Preprint,arXiv:1211.3711 , 2012 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . arXiv Preprint,arXiv:1301.3781 , 2013 .
XIE H Q , QIN Z J , LI G Y , et al . Deep learning enabled semantic communication systems [J ] . IEEE Transactions on Signal Processing , 2021 , 69 : 2663 - 2675 .
SANA M , STRINATI E C . Learning semantics:an opportunity for effective 6G communications [C ] // Proceedings of 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC) . Piscataway:IEEE Press , 2022 : 631 - 636 .
ZHOU Q Y , LI R P , ZHAO Z F , et al . Semantic communication with adaptive universal transformer [J ] . IEEE Wireless Communications Letters , 2022 , 11 ( 3 ): 453 - 457 .
DEHGHANI M , GOUWS S , VINYALS O , et al . Universal transformers [J ] . arXiv Preprint,arXiv:1807.03819 , 2018 .
GRAVES A . Adaptive computation time for recurrent neural networks [J ] . arXiv Preprint,arXiv:1603.08983 , 2016 .
LEE C H , LIN J W , CHEN P H , et al . Deep learning-constructed joint transmission-recognition for Internet of things [J ] . IEEE Access , 2019 , 7 : 76547 - 76561 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2016 : 770 - 778 .
XU J L , AI B , CHEN W , et al . Wireless image transmission using deep source channel coding with attention modules [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 4 ): 2315 - 2328 .
HU Q , ZHANG G , QIN Z , et al . Robust semantic communications against semantic noise [J ] . arXiv Preprint,arXiv:2202.03338 , 2022 .
HE K M , CHEN X L , XIE S N , et al . Masked autoencoders are scalable vision learners [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2022 : 15979 - 15988 .
SCHNEIDER S , BAEVSKI A , COLLOBERT R , et al . Wav2Vec:unsupervised pre-training for speech recognition [J ] . arXiv Preprint,arXiv:1904.05862 , 2019 .
WENG Z Z , QIN Z J . Semantic communication systems for speech transmission [J ] . IEEE Journal on Selected Areas in Communications , 2021 , 39 ( 8 ): 2434 - 2444 .
WENG Z Z , QIN Z J , LI G Y . Semantic communications for speech recognition [J ] . arXiv Preprint,arXiv:2107.11190 , 2021 .
SCHUSTER M , PALIWAL K K . Bidirectional recurrent neural networks [J ] . IEEE Transactions on Signal Processing , 1997 , 45 ( 11 ): 2673 - 2681 .
TUNG T Y , GÜNDÜZ D . DeepWiVe:deep-learning-aided wireless video transmission [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 9 ): 2570 - 2583 .
WANG S , DAI J , LIANG Z , et al . Wireless deep video semantic transmission [J ] . arXiv Preprint,arXiv:2205.13129 , 2022 .
TAO X M , DUAN Y P , XU M , et al . Learning QoE of mobile video transmission with deep neural network:a data-driven approach [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 6 ): 1337 - 1348 .
FRIED O , TEWARI A , ZOLLHÖFER M , , et al . Text-based editing of talking-head video [J ] . ACM Transactions on Graphics , 2019 , 38 ( 4 ): 1 - 14 .
TANDON P , CHANDAK S , PATARANUTAPORN P , et al . Txt2Vid:ultra-low bitrate compression of talking-head videos via text [J ] . arXiv Preprint,arXiv:2106.14014 , 2021 .
赵亮 . 多模态数据融合算法研究[D]. 大连:大连理工大学 , 2018 .
ZHAO L . Research on multimodal data fusion algorithm [D ] . Dalian:Dalian University of Technology , 2018 .
任泽裕 , 王振超 , 柯尊旺 , 等 . 多模态数据融合综述 [J ] . 计算机工程与应用 , 2021 , 57 ( 18 ): 49 - 64 .
REN Z Y , WANG Z C , KE Z W , et al . Survey of multimodal data fusion [J ] . Computer Engineering and Applications , 2021 , 57 ( 18 ): 49 - 64 .
LAHAT D , ADALI T , JUTTEN C . Multimodal data fusion:an overview of methods,challenges,and prospects [J ] . Proceedings of the IEEE , 2015 , 103 ( 9 ): 1449 - 1477 .
PEREZ-RUA J M , VIELZEUF V , PATEUX S , et al . MFAS:multimodal fusion architecture search [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 6959 - 6968 .
VIELZEUF V , LECHERVY A , PATEUX S , et al . CentralNet:a multilayer approach for multimodal fusion [J ] . arXiv Preprint,arXiv:1808.07275 , 2018 .
SNOEK C G M , WORRING M , SMEULDERS A W M . Early versus late fusion in semantic video analysis [C ] // Proceedings of the 13th Annual ACM International Conference on Multimedia . New York:ACM Press , 2005 : 399 - 402 .
NATARAJAN P , WU S , VITALADEVUNI S , et al . Multimodal feature fusion for robust event detection in web videos [C ] // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2012 : 1298 - 1305 .
BEN-YOUNES H , CADENE R , CORD M , et al . MUTAN:multimodal tucker fusion for visual question answering [C ] // Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2017 : 2631 - 2639 .
YE G N , LIU D , JHUO I H , et al . Robust late fusion with rank minimization [C ] // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2012 : 3021 - 3028 .
MNIH V , HEESS N , GRAVES A , et al . Recurrent models of visual attention [J ] . arXiv Preprint,arXiv:1406.6247 , 2014 .
WANG F , JIANG M Q , QIAN C , et al . Residual attention network for image classification [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2017 : 6450 - 6458 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York:ACM Press , 2017 : 6000 - 6010 .
KIM J H , ON K W , LIM W , et al . Hadamard product for low-rank bilinear pooling [J ] . arXiv Preprint,arXiv:1610.04325 , 2016 .
YANG Z C , HE X D , GAO J F , et al . Stacked attention networks for image question answering [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2016 : 21 - 29 .
ANDERSON P , HE X D , BUEHLER C , et al . Bottom-up and top-down attention for image captioning and visual question answering [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 6077 - 6086 .
LU J S , YANG J W , BATRA D , et al . Hierarchical question-image co-attention for visual question answering [C ] // Proceedings of the 30th International Conference on Neural Information Processing Systems . New York:ACM Press , 2016 : 289 - 297 .
YU Z , YU J , CUI Y H , et al . Deep modular Co-attention networks for visual question answering [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 6274 - 6283 .
NAM H , HA J W , KIM J . Dual attention networks for multimodal reasoning and matching [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2017 : 2156 - 2164 .
XIE H , QIN Z , LI G Y . Task-oriented semantic communications for multimodal data [J ] . arXiv Preprint,arXiv:2108.07357 , 2021 .
RUSSAKOVSKY O , DENG J , SU H , et al . ImageNet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 .
HUDSON D A , MANNING C D . Compositional attention networks for machine reasoning [J ] . arXiv Preprint,arXiv:1803.03067 , 2018 .
XIE H Q , QIN Z J , TAO X M , et al . Task-oriented multi-user semantic communications [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 9 ): 2584 - 2597 .
ZHANG G , HU Q , QIN Z , et al . A unified multi-task semantic communication system with domain adaptation [J ] . arXiv Preprint,arXiv:2206.00254 , 2022 .
LUO X W , GAO R B , CHEN H H , et al . Multi-modal and multi-user semantic communications for channel-level information fusion [J ] . IEEE Wireless Communications , 2022 :doi.org/10.1109/MWC.011.2200288.
YANG W , LIEW Z Q , LIM W Y B , et al . Semantic communication meets edge intelligence [J ] . arXiv Preprint,arXiv:2202.06471 , 2022 .
KIM B , SAGDUYU Y E , DAVASLIOGLU K , et al . Channel-aware adversarial attacks against deep learning-based wireless signal classifiers [J ] . IEEE Transactions on Wireless Communications , 2022 , 21 ( 6 ): 3868 - 3880 .
ZHENG Z R , LI Z T , JIANG H B , et al . Semantic-aware privacy-preserving online location trajectory data sharing [J ] . IEEE Transactions on Information Forensics and Security , 2022 , 17 : 2256 - 2271 .
BAJIĆI V , LIN W S , TIAN Y H . Collaborative intelligence:challenges and opportunities [C ] // Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2021 : 8493 - 8497 .
MIRESHGHALLAH F , TARAM M , RAMRAKHYANI P , et al . Shredder:learning noise distributions to protect inference privacy [C ] // Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems . New York:ACM Press , 2020 : 3 - 18 .
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial networks [J ] . Communications of the ACM , 2020 , 63 ( 11 ): 139 - 144 .
TUNG T Y , GUNDUZ D . Deep joint source-channel and encryption coding:secure semantic communications [J ] . arXiv Preprint,arXiv:2208.09245 , 2022 .
LUO X , CHEN Z , TAO M , et al . Encrypted semantic communication using adversarial training for privacy preserving [J ] . arXiv Preprint,arXiv:2209.09008 , 2022 .
LU K , ZHOU Q Y , LI R P , et al . Rethinking modern communication from semantic coding to semantic communication [J ] . IEEE Wireless Communications , 2023 , 30 ( 1 ): 158 - 164 .
SEO H , PARK J , BENNIS M , et al . Semantics-native communication with contextual reasoning [J ] . arXiv Preprint,arXiv:2108.05681 , 2021 .
ZHAO T T , LI G B , ZHANG G M , et al . Security-enhanced user pairing for MISO-NOMA downlink transmission [C ] // Proceedings of 2018 IEEE Global Communications Conference (GLOBECOM) . Piscataway:IEEE Press , 2019 : 1 - 6 .
ZHAO T T , HE L J , HUANG X Y , et al . QoE-driven secure video transmission in cloud-edge collaborative networks [J ] . IEEE Transactions on Vehicular Technology , 2022 , 71 ( 1 ): 681 - 696 .
ZHAO T T , HE L J , HUANG X Y , et al . DRL-based secure video offloading in MEC-enabled IoT networks [J ] . IEEE Internet of Things Journal , 2022 , 9 ( 19 ): 18710 - 18724 .
ZHAO T T , LI F , HE L J . DRL-based joint resource allocation and device orchestration for hierarchical federated learning in NOMA-enabled industrial IoT [J ] . IEEE Transactions on Industrial Informatics , 2022 :doi.org/10.1109/TII.2022.3170900.
LIU Y Q , XU K D , LI J X , et al . Millimeter-wave E-plane waveguide bandpass filters based on spoof surface plasmon polaritons [J ] . IEEE Transactions on Microwave Theory and Techniques , 2022 , 70 ( 10 ): 4399 - 4409 .
LIU Y Q , XU K D . Design of millimeter-wave bandpass filter using edge-coupling dual-mode resonator [C ] // Proceedings of 2021 IEEE Asia-Pacific Microwave Conference (APMC) . Piscataway:IEEE Press , 2022 : 154 - 156 .
0
浏览量
1521
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构