浏览全部资源
扫码关注微信
1. 信息工程大学密码工程学院,河南 郑州 450001
2. 加利福尼亚大学河滨分校,河滨 CA92521
[ "郭渊博(1975- ),男,陕西周至人,博士,信息工程大学教授、博士生导师,主要研究方向为网络防御、数据挖掘、机器学习和人工智能安全等" ]
[ "李勇飞(1998- ),男,河南开封人,信息工程大学硕士生,主要研究方向为威胁情报实体抽取及关系抽取等" ]
[ "陈庆礼(1998- ),男,河南新乡人,信息工程大学硕士生,主要研究方向为人工智能安全" ]
[ "方晨(1993- ),男,安徽宿松人,博士,信息工程大学讲师,主要研究方向为机器学习、隐私安全" ]
[ "胡阳阳(1990- ),男,江苏南京人,加利福尼亚大学河滨分校博士生,主要研究方向为机器学习" ]
网络出版日期:2022-06,
纸质出版日期:2022-07-25
移动端阅览
郭渊博, 李勇飞, 陈庆礼, 等. 融合Focal Loss的网络威胁情报实体抽取[J]. 通信学报, 2022,43(7):85-92.
Yuanbo GUO, Yongfei LI, Qingli CHEN, et al. Fusion of Focal Loss’s cyber threat intelligence entity extraction[J]. Journal on communications, 2022, 43(7): 85-92.
郭渊博, 李勇飞, 陈庆礼, 等. 融合Focal Loss的网络威胁情报实体抽取[J]. 通信学报, 2022,43(7):85-92. DOI: 10.11959/j.issn.1000-436x.2022132.
Yuanbo GUO, Yongfei LI, Qingli CHEN, et al. Fusion of Focal Loss’s cyber threat intelligence entity extraction[J]. Journal on communications, 2022, 43(7): 85-92. DOI: 10.11959/j.issn.1000-436x.2022132.
网络威胁情报(CTI)蕴含丰富的威胁行为知识,及时分析处理威胁情报能够促进网络攻防由被动防御向主动防御的转变。当前多数威胁情报以自然语言文本的形式存在,包含大量非结构化数据,需要利用实体抽取方法将其转换为结构化数据以便后续处理。然而,由于威胁情报中包含大量漏洞名称、恶意软件、APT组织等专业词汇,且实体分布极不平衡,导致通用领域的实体抽取方法应用于威胁情报时受到极大限制。为此,提出一种融合Focal Loss的实体抽取模型,通过引入平衡因子和调制系数改进交叉熵损失函数,平衡样本分布。此外,针对威胁情报结构复杂且来源广泛,包含大量专业词汇的问题,在模型中增加单词和字符特征,有效改善了威胁情报中的OOV问题。实验结果表明,相较于现有主流模型BiLSTM和BiLSTM-CRF,所提模型在F1分数上分别提高了7.07%和4.79%,验证了引入Focal Loss和字符特征的有效性。
Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays
most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data
which needs to be converted into structured data for subsequent processing using entity extraction methods.However
since threat intelligence contains numerous terminology such as vulnerability names
malware and APT organizations
and the distribution of entities are extremely unbalanced
the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore
an entity extraction model integrated with Focal Loss was proposed
which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition
for the problem that threat intelligence had a complex structure and a wide range of sources
and contained a large number of professional words
token and character features were added to the model
which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF
the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively
which verifies the effectiveness of introducing Focal Loss and character features.
DALY M K , . Advanced persistent threat [C ] // Proceedings of 23rd Large Installation System Administration Conference . Berkeley:USENIX Association , 2009 : 1 - 6 .
MCMILLAN R . Definition:threat intelligence [R ] . Garter Research , 2013 .
李涛 , 郭渊博 , 琚安康 . 融合对抗主动学习的网络安全知识三元组抽取 [J ] . 通信学报 , 2020 , 41 ( 10 ): 80 - 91 .
LI T , GUO Y B , JU A K . Knowledge triple extraction in cybersecurity with adversarial active learning [J ] . Journal on Communications , 2020 , 41 ( 10 ): 80 - 91 .
HOHENECKER P , MTUMBUKA F , KOCIJAN V , et al . Systematic comparison of neural architectures and training approaches for open information extraction [C ] // Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg:Association for Computational Linguistics , 2020 : 8554 - 8565 .
LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [C ] // Proceedings of 2017 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2017 : 2999 - 3007 .
LEI J B , TANG B Z , LU X Q , et al . A comprehensive study of named entity recognition in Chinese clinical text [J ] . Journal of the American Medical Informatics Association , 2013 , 21 ( 5 ): 808 - 814 .
刘显敏 , 李建中 . 基于键规则的 XML 实体抽取方法 [J ] . 计算机研究与发展 , 2014 , 51 ( 1 ): 64 - 75 .
LIU X M , LI J Z . Key-based method for extracting entities from XML data [J ] . Journal of Computer Research and Development , 2014 , 51 ( 1 ): 64 - 75 .
MULWAD V , LI W J , JOSHI A , et al . Extracting information about security vulnerabilities from web text [C ] // Proceedings of 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology . Piscataway:IEEE Press , 2011 : 257 - 260 .
BRIDGES R A , HUFFER K M T , JONES C L , et al . Cybersecurity automated information extraction techniques:drawbacks of current methods,and enhanced extractors [C ] // Proceedings of 2017 16th IEEE International Conference on Machine Learning and Applications . Piscataway:IEEE Press , 2017 : 437 - 442 .
JONES C L , BRIDGES R A , HUFFER K M T , et al . Towards a relation extraction framework for cyber-security concepts [C ] // Proceedings of the 10th Annual Cyber and Information Security Research Conference .[S.l.:s.n. ] , 2015 : 1 - 4 .
HUANG Z , XU W , YU K . Bidirectional LSTM-CRF models for sequence tagging [J ] . arXiv Preprint,arXiv:150801991 , 2015 .
SARHAN I , SPRUIT M . Open-CyKG:an open cyber threat intelligence knowledge graph [J ] . Knowledge-Based Systems , 2021 ,233:107524.
ZHAO J , YAN Q B , LI J X , et al . TIMiner:automatically extracting and analyzing categorized cyber threat intelligence from social data [J ] . Computers & Security , 2020 ,95:101867.
GASMI H , LAVAL J , BOURAS A . Information extraction of cybersecurity concepts:an LSTM approach [J ] . Applied Sciences , 2019 , 9 ( 19 ): 3945 .
王伟平 , 宁翔凯 , 宋虹 , 等 . iAES:面向网络安全博客的 IOC 自动抽取方法 [J ] . 计算机学报 , 2021 , 44 ( 5 ): 882 - 896 .
WANG W P , NING X K , SONG H , et al . An indicator of compromise extraction method based on deep learning [J ] . Chinese Journal of Computers , 2021 , 44 ( 5 ): 882 - 896 .
WU Y M , LIU Q J , LIAO X J , et al . Price TAG:towards semi-automatically discovery tactics,techniques and procedures of E-commerce cyber threat intelligence [J ] . IEEE Transactions on Dependable and Secure Computing , 2021 , PP ( 99 ): 1 .
MANIKANDAN R , MADGULA K , SAHA S . TeamDL at SemEval 2018 task 8:cybersecurity text analysis using convolutional neural network and conditional random fields [C ] // Proceedings of the 12th International Workshop on Semantic Evaluation . Stroudsburg:Association for Computational Linguistics , 2018 : 868 - 873 .
MA X Z , HOVY E . End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF [J ] . arXiv Preprint,arXiv:160301354 , 2016 .
FU M M , ZHAO X M , YAN Y H . HCCL at SemEval-2018 task 8:an end-to-end system for sequence labeling from cybersecurity reports [C ] // Proceedings of the 12th International Workshop on Semantic Evaluation . Stroudsburg:Association for Computational Linguistics , 2018 : 874 - 877 .
CHIU J P C , NICHOLS E . Named entity recognition with bidirectional LSTM-CNNs [J ] . arXiv Preprint,arXiv:1511.08308 , 2015 .
SANTOS C N D , ZADROZNY B . Learning character-level representations for part-of-speech tagging [C ] // Proceedings of International Conference on Machine Learning . New York:ACM Press , 2014 : 1818 - 1826 .
0
浏览量
628
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构