融合对抗主动学习的网络安全知识三元组抽取

李涛; 郭渊博; 琚安康

doi:10.11959/j.issn.1000-436x.2020174

您当前的位置：

首页 >

文章列表页 >

融合对抗主动学习的网络安全知识三元组抽取

学术论文 | 更新时间：2024-06-05

- 融合对抗主动学习的网络安全知识三元组抽取
- Knowledge triple extraction in cybersecurity with adversarial active learning
- 通信学报 2020年41卷第10期页码：80-91
- 作者机构：
  
  信息工程大学密码工程学院，河南郑州 450001
- 作者简介：
  
  [ "李涛（1992- ），男，甘肃甘谷人，信息工程大学博士生，主要研究方向为网络威胁语义建模" ]
  [ "郭渊博（1975- ），男，陕西周至人，博士，信息工程大学教授、博士生导师，主要研究方向为大数据安全、态势感知" ]
  [ "琚安康（1995- ），男，河南辉县人，信息工程大学博士生，主要研究方向为多步攻击检测、异构安全数据融合" ]
- 基金信息：
  
  国家自然科学基金资助项目(61501515)
- DOI：10.11959/j.issn.1000-436x.2020174
  中图分类号： TP391
- 网络首发：2020-10，
  
  纸质出版：2020-10-25
- 稿件说明：
移动端阅览
李涛, 郭渊博, 琚安康. 融合对抗主动学习的网络安全知识三元组抽取[J]. 通信学报, 2020,41(10):80-91.

Tao LI, Yuanbo GUO, Ankang JU. Knowledge triple extraction in cybersecurity with adversarial active learning[J]. Journal on Communications, 2020, 41(10): 80-91.
李涛, 郭渊博, 琚安康. 融合对抗主动学习的网络安全知识三元组抽取[J]. 通信学报, 2020,41(10):80-91. DOI： 10.11959/j.issn.1000-436x.2020174.

Tao LI, Yuanbo GUO, Ankang JU. Knowledge triple extraction in cybersecurity with adversarial active learning[J]. Journal on Communications, 2020, 41(10): 80-91. DOI： 10.11959/j.issn.1000-436x.2020174.

摘要

针对当前网络安全领域知识获取中所依赖的流水线模式存在实体识别错误的传播，未考虑实体识别与关系抽取任务间的联系，以及模型训练缺乏标签语料的问题，提出一种融合对抗主动学习的端到端网络安全知识三元组抽取方法。首先，将实体识别与关系抽取通过联合标注策略建模为序列标注任务；然后，设计融合动态注意力机制的BiLSTM-LSTM模型实现实体与关系的联合抽取，并形成三元组；最后，基于对抗网络训练一个判别器模型，增量地筛选出高质量的待标注数据进行标注，并通过迭代训练不断提升联合抽取模型的性能。通过实验表明，所提方案中实体-关系联合抽取模型优于现有的网络安全知识抽取方案，并验证了对抗主动学习方法的有效性。

Abstract

Aiming at the problem that using pipeline methods for extracting cybersecurity knowledge triples may cause the errors propagation of entity recognition and did not consider the correlation between entity recognition and relation extraction

and training triple extraction model lacked labeled corpora

an end-to-end cybersecurity knowledge triple extraction method with adversarial active learning was proposed.For knowledge triple extraction

the conventional entity recognition and relation extraction were modelled as sequence labeling task through joint labeling strategy firstly.And then

a BiLSTM-LSTM-based model with dynamic attention mechanism was designed to jointly extract entities and relations

forming triples.Finally

with adversarial learning framework

a discriminator was trained to incrementally select high-quality samples for labeling

and the performance of the joint extraction model was continuously enhanced by iterative retraining.Experiments show that the proposed joint extraction model outperforms the existing cybersecurity knowledge triple extraction methods

and demonstrate the effectiveness of proposed adversarial active learning scheme.

关键词

Keywords

references

JOSHI A , LAL R , FININ T , et al . Extracting cybersecurity related linked data from text [C ] // 2013 IEEE Seventh International Conference on Semantic Computing . Piscataway:IEEE Press , 2013 : 252 - 259 .

鄂海红 , 张文静 , 肖思琪 , 等 . 深度学习实体关系抽取研究综述 [J ] . 软件学报 , 2019 , 30 ( 6 ): 1793 - 1818 .

E H H , ZHANG W J , XIAO S Q , et al . Survey of entity relationship extraction based on deep learning [J ] . Journal of Software , 2019 , 30 ( 6 ): 1793 - 1818 .

PHANDI P , SILVA A , LU W . Semeval-2018 task 8:semantic extraction from cybersecurity reports using natural language processing (SecureNLP) [C ] // Proceedings of the 12th International Workshop on Semantic Evaluation.[S.n.:s.l] . 2018 : 697 - 706 .

SIMRAN K , SRIRAM R , VINAYAKUMAR R , et al . Deep learning approach for intelligent named entity recognition of cyber security [J ] . arXiv Preprint,arXiv:2004.00502 , 2020

PINGLE A , PIPLAI A , MITTAL S , et al . RelExt:relation extraction using deep learning approaches for cybersecurity knowledge graph improvement [C ] // Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining . Piscataway:IEEE Press , 2019 : 879 - 886 .

HUANG W , CHENG X , WANG T , et al . BERT-based multi-head selection for joint entity-relation extraction [C ] // CCF International Conference on Natural Language Processing and Chinese Computing . Berlin:Springer , 2019 : 713 - 723 .

曹明宇 , 杨志豪 , 罗凌 , 等 . 基于神经网络的药物实体与关系联合抽取 [J ] . 计算机研究与发展 , 2019 , 56 ( 7 ): 1432 - 1440 .

CAO M Y , YANG Z H , LUO L , et al . Joint drug entities and relations extraction based on neural networks [J ] . Journal of Computer Research and Development , 2019 , 56 ( 7 ): 1432 - 1440 .

ZHENG S , WANG F , BAO H , et al . Joint extraction of entities and relations based on a novel tagging scheme [C ] // Proceedings of the 55th Association for Computational Linguistics.[S.n.:s.l] . 2017 : 1227 - 1236 .

LIAO X . Towards automatically evaluating security risks and providing cyber intelligence [D ] . Atlanta:Georgia Institute of Technology , 2017 .

PANWAR A . Toward automatic generation and analysis of indicators of compromise (IoCS) using convolutional neural network [D ] . Arizona:Arizona State University , 2017 .

GASMI H , LAVAL J , BOURAS A . Information extraction of cybersecurity concepts:an LSTM approach [J ] . Applied Science , 2019 , 9 ( 19 ): 1 - 15 .

CHAMBERS N , FRY B , MCMASTERS J . Detecting denial-of-service attacks from social media text:applying nlp to computer security [C ] // Proceedings of the North American Chapter of the Association for Computational Linguistics.[S.n.:s.l] . 2018 : 1626 - 1635 .

ZHOU S , LONG Z , TAN L , et al . Automatic identification of indicators of compromise using neural-based sequence labelling [J ] . arXiv Preprint,arXiv:1810.10156 , 2018

LONG Z , TAN L , ZHOU S , et al . Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling [C ] // 2019 International Joint Conference on Neural Networks (IJCNN) . Piscataway:IEEE Press , 2019 : 1 - 8 .

秦娅 , 申国伟 , 赵文波 , 等 . 基于深度神经网络的网络安全实体识别方法 [J ] . 南京大学学报(自然科学) , 2019 , 55 ( 1 ): 29 - 40 .

QIN Y , SHEN G W , ZHAO W B , et al . Research on the method of network security entity recognition based on deep neural network [J ] . Journal of Nanjing University(Natural Science) , 2019 , 55 ( 1 ): 29 - 40 .

张若彬 , 刘嘉勇 , 何祥 . 基于BLSTM-CRF模型的安全漏洞领域命名实体识别 [J ] . 四川大学学报(自然科学版) , 2019 , 56 ( 3 ): 469 - 475 .

ZHANG R B , LIU J Y , HE X . Named entity recognition for vulnerabilities based on BLSTM-CRF model [J ] . Journal of Sichuan University(Natural Science Edition) , 2019 , 56 ( 3 ): 469 - 475 .

ZHU J J , BENTO J . Generative adversarial active learning [J ] . arXiv Preprint,arXiv:1702.07956v5 , 2017

CULOTTA A , MCCALLUM A . Reducing labeling effort for structured prediction tasks [C ] // International Conference on Artificial Intelligence . Piscataway:IEEE Press , 2005 : 746 - 751 .

HOULSBY N , HUSZAR F , GHAHRAMANI Z , et al . Bayesian active learning for classification and preference learning [J ] . arXiv Preprint,arXiv:1112.5745 , 2011

GAL Y , GHAHRAMANI Z . Dropout as a Bayesian approximation:representing model uncertainty in deep learning [C ] // International Conference on Machine Learning . Piscataway:IEEE Press , 2016 : 1050 - 1059 .

SENER O , SAVARESE S . Active Learning for convolutional neural networks:a core-set approach [J ] . arXiv Preprint,arXiv:1708.00489 , 2017

KUO W , HANE C , YUH E L , et al . Cost-sensitive active learning for intracranial hemorrhage detection [C ] // Medical Image Computing and Computer Assisted Intervention . Piscataway:IEEE Press , 2018 : 715 - 723 .

SHEN Y , YUN H , LIPTON Z C , et al . Deep active learning for named entity recognition [C ] // International Conference on Learning Representations . Piscataway:IEEE Press , 2018 : 1 - 15 .

CHIU J P C , NICHOLS E . Named entity recognition with bidirectional LSTM-CNNs [J ] . Transactions of the Association for Computational Linguistics , 2016 , 4 : 357 - 370 .

CAO P , CHEN Y , LIU K , et al . Adversarial transfer learning for chinese named entity recognition with self-attention mechanism [C ] // The 2018 Conference on Empirical Methods in Natural Language Processing . Piscataway:IEEE Press , 2018 : 182 - 192 .

程梦 , 洪宇 , 唐建 , 等 . 面向属性抽取的门控动态注意力机制 [J ] . 模式识别与人工智能 , 2019 , 32 ( 2 ): 184 - 192 .

CHENG M , HONG Y , TANG J , et al . Gated dynamic attention mechanism towards aspect extraction [J ] . Pattern Recognition and Artificial Intelligence , 2019 , 32 ( 2 ): 184 - 192 .

TIELEMAN T , HINTON G.Lecture 6 . 5-rmsprop,coursera:neural networks for machine learning [R ] . University of Toronto,Technical Report , 2012 .

张晓斌 , 陈福才 , 黄瑞阳 . 基于 CNN 和双向 LSTM 融合的实体关系抽取 [J ] . 网络与信息安全学报 , 2018 , 4 ( 9 ): 44 - 51 .

ZHANG X B , CHEN F C , HUANG R Y . Relation extraction based on CNN and BiLSTM [J ] . Chinese Journal of Network and Information Security , 2018 , 4 ( 9 ): 44 - 51 .

XU Y , MOU L , LI G , et al . Classifying Relations via long short term memory networks along shortest dependency paths [C ] // The 2015 Conference on Empirical Methods in Natural Language Processing . Piscataway:IEEE Press , 2015 : 1785 - 1794 .

MIWA M , BANSAL M . End-to-end relation extraction using LSTMs on sequences and tree structures [C ] // The 54th Annual Meeting of the Association for Computational Linguistics . Piscataway:IEEE Press , 2016 : 1105 - 1116 .

BEKOULIS G , DELEU J , DEMEESTER T , et al . Joint entity recognition and relation extraction as a multi-head selection problem [J ] . arXiv Preprint,arXiv:1804.07847 , 2018

浏览量

3120

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于大语言模型的网络威胁情报知识图谱构建技术研究

基于提示问答数据增强的小样本网络安全事件检测方法

轨道交通移动边缘计算网络安全综述

智能网联车网络安全研究综述

主动学习策略融合算法在高光谱图像分类中的应用