Research on knowledge graph construction technology for cyber threat intelligence based on large language models

LAI Qingnan; JIN Jiandong; ZHOU Changling

doi:10.11959/j.issn.1000-436x.2024225

您当前的位置：

首页 >

文章列表页 >

Research on knowledge graph construction technology for cyber threat intelligence based on large language models

Cyber Security | 更新时间：2024-12-31

- Research on knowledge graph construction technology for cyber threat intelligence based on large language models
- Journal on Communications Vol. 45, Issue Z2, Pages: 33-43(2024)
- 作者机构：
  
  北京大学计算中心，北京 100871
- 作者简介：
- 基金信息：
- DOI：10.11959/j.issn.1000-436x.2024225
  CLC： TP181
- Received：22 October 2024，
  
  Published：30 November 2024
- 稿件说明：
移动端阅览
赖清楠,金建栋,周昌令.基于大语言模型的网络威胁情报知识图谱构建技术研究[J].通信学报,2024,45(Z2):33-43.

LAI Qingnan,JIN Jiandong,ZHOU Changling.Research on knowledge graph construction technology for cyber threat intelligence based on large language models[J].Journal on Communications,2024,45(Z2):33-43.
赖清楠,金建栋,周昌令.基于大语言模型的网络威胁情报知识图谱构建技术研究[J].通信学报,2024,45(Z2):33-43. DOI： 10.11959/j.issn.1000-436x.2024225.

LAI Qingnan,JIN Jiandong,ZHOU Changling.Research on knowledge graph construction technology for cyber threat intelligence based on large language models[J].Journal on Communications,2024,45(Z2):33-43. DOI： 10.11959/j.issn.1000-436x.2024225.

摘要

随着网络威胁的复杂性和精细度不断增加，将网络威胁情报整合到网络安全措施中变得至关重要。设计了一个基于大语言模型的网络威胁情报知识图谱构建框架AutoCTI2KG，通过指令提示和上下文学习，自动从网络威胁情报中生成网络安全知识图谱和攻击知识图谱，并提供可操作的防护建议。实验结果表明，所提出的框架在网络安全知识图谱和攻击知识图谱构建方面表现出色，F1值在0.90左右，展示了大语言模型在网络安全领域知识图谱构建的潜力。所提出的框架不仅推进了网络安全知识图谱构建的前沿技术，还为网络安全专业人员提供了一个实用工具，以更好地理解和降低网络风险。

Abstract

As the complexity and sophistication of cyber threats continue to increase

integrating cyber threat intelligence into cybersecurity measures has become crucial. A framework called AutoCTI2KG was proposed

which was based on large language models for constructing cyber threat intelligence knowledge graphs. Through instruction prompts and context learning

AutoCTI2KG automatically generated cybersecurity and attack knowledge graphs from cyber threat intelligence and provided actionable defense recommendations. Experimental results show that the proposed framework performs excellently in constructing cybersecurity and attack knowledge graphs

with F1 scores around 0.90

demonstrating the potential of large language models in knowledge graph construction in the cybersecurity domain. This work not only advances the frontier of cybersecurity knowledge graph construction but also provides a practical tool for cybersecurity professionals to better understand and mitigate cyber risks.

关键词

Keywords

references

丁兆云 , 刘凯 , 刘斌 , 等 . 网络安全知识图谱研究综述 [J ] . 华中科技大学学报(自然科学版) , 2021 , 49 ( 7 ): 79 - 91 .

DING Z Y , LIU K , LIU B , et al . Survey of cyber security knowledge graph [J ] . Journal of Huazhong University of Science and Technology (Natural Science Edition) , 2021 , 49 ( 7 ): 79 - 91 .

OASIS . STIX™ Version 2.1 [EB/OL ] . ( 2021 )[ 2024-8-10 ] .

MITRE . MITRE ATT&CK® [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

MITRE . Common Attack Pattern Enumeration and Classification (CAPEC) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

MITRE . Common Weakness Enumeration (CWE) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

MITRE . Common Vulnerabilities and Exposures (CVE) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

NIST . National Vulnerability Database (NVD) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

CNVD . China National Vulnerability Database (CNVD) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

DEVLIN J , CHANG M W , LEE K , et al . BERT: pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Stroudsburg : ACL Press 2019 : 4171 - 4186 .

RADFORD A , WU J , CHILD R , et al . Language models are unsupervised multitask learners [EB/OL ] . ( 2019 )[ 2024-08-10 ] .

BROWN T , MANN B , RYDER N , et al . Language models are few-shot learners [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 1877 - 1901 .

OpenAI . ChatGPT: optimizing language models for dialogue [EB/OL ] . ( 2022 )[ 2024-08-10 ] .

OpenAI , ACHIAM J , ADLER S , et al . GPT-4 technical report [J ] . arXiv Preprint , arXiv: 2303 . 08774 v 6 , 2023 .

车万翔 , 窦志成 , 冯岩松 , 等 . 大模型时代的自然语言处理: 挑战、机遇与发展 [J ] . 中国科学: 信息科学 , 2023 , 53 ( 9 ): 1645 - 1687 .

CHE W X , DOU Z C , FENG Y S , et al . Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future directions [J ] . Scientia Sinica (Informationis) , 2023 , 53 ( 9 ): 1645 - 1687 .

黄勃 , 吴申奥 , 王文广 , 等 . 图模互补: 知识图谱与大模型融合综述 [J ] . 武汉大学学报(理学版) , 2024 , 70 ( 4 ): 397 - 412 .

HUANG B , WU S A , WANG W G , et al . KG-LLM-MCom: a survey on integration of knowledge graph and large language model [J ] . Journal of Wuhan University (Natural Science Edition) , 2024 , 70 ( 4 ): 397 - 412 .

史慧洋 , 魏靖烜 , 蔡兴业 , 等 . 威胁情报提取与知识图谱构建技术研究 [J ] . 西安电子科技大学学报 , 2023 , 50 ( 4 ): 65 - 75 .

SHI H Y , WEI J X , CAI X Y , et al . Research on threat intelligence extraction and knowledge graph construction technology [J ] . Journal of Xidian University , 2023 , 50 ( 4 ): 65 - 75 .

黄智勇 , 余雅宁 , 林仁明 , 等 . 基于改进BiLSTM-CRF模型的网络安全知识图谱构建 [J ] . 现代电子技术 , 2024 , 47 ( 6 ): 15 - 21 .

HUANG Z Y , YU Y N , LIN R M , et al . Knowledge graph construction for network security base on modified BiLSTM-CRF [J ] . Modern Electronics Technique , 2024 , 47 ( 6 ): 15 - 21 .

唐思宇 , 李赛飞 , 张丽杰 . 基于Neo4j的网络安全知识图谱构建分析 [J ] . 信息安全与通信保密 , 2022 , 20 ( 8 ): 60 - 70 .

TANG S Y , LI S F , ZHANG L J . Research on the construction of cyber security knowledge graph based on Neo4j [J ] . Information Security and Communications Privacy , 2022 , 20 ( 8 ): 60 - 70 .

王晓狄 , 黄诚 , 刘嘉勇 . 面向网络安全开源情报的知识图谱研究综述 [J ] . 信息网络安全 , 2023 , 23 ( 6 ): 11 - 21 .

WANG X D , HUANG C , LIU J Y . A survey of cyber security open-source intelligence knowledge graph [J ] . Netinfo Security , 2023 , 23 ( 6 ): 11 - 21 .

GAO P , LIU X Y , CHOI E , et al . ThreatKG: an AI-powered system for automated open-source cyber threat intelligence gathering and management [J ] . arXiv Preprint , arXiv: 2212 . 10388 v 2 , 2022 .

LI Z Y , ZENG J , CHEN Y , et al . AttacKG: constructing technique knowledge graph from Cyber threat intelligence reports [C ] // Proceedings of Lecture Notes in Computer Science . Cham : Springer International Publishing , 2022 : 589 - 609 .

ZHANG Y H , DU T W , MA Y S , et al . AttacKG+: boosting attack knowledge graph construction with large language models [J ] . arXiv Preprint , arXiv: 2405 . 04753 v 1 , 2024 .

AGRAWAL M , HEGSELMANN S , LANG H , et al . Large language models are few-shot clinical information extractors [J ] . arXiv Preprint , arXiv: 2205 . 12689 v 2 , 2022 .

WEI X , CUI X Y , CHENG N , et al . ChatIE: zero-shot information extraction via chatting with ChatGPT [J ] . arXiv Preprint , arXiv: 2302 . 10205 v 2 , 2023 .

POLAK M P , MORGAN D . Extracting accurate materials data from research papers with conversational language models and prompt engineering [J ] . Nature Communications , 2024 , 15 ( 1 ): 1569 .

CHEN B H , BERTOZZI A L . AutoKG: efficient automated knowledge graph generation for language models [C ] // Proceedings of the 2023 IEEE International Conference on Big Data (BigData) . Piscataway : IEEE Press , 2023 : 3117 - 3126 .

PAN S R , LUO L H , WANG Y F , et al . Unifying large language models and knowledge graphs: a roadmap [J ] . IEEE Transactions on Knowledge and Data Engineering , 2024 , 36 ( 7 ): 3580 - 3599 .

LI J Q , WANG M M , ZHENG Z L , et al . LooGLE: can long-context language models understand long contexts? [J ] . arXiv Preprint , arXiv: 2311 . 04939 v 2 , 2023 .

DONG Z C , TANG T Y , LI J Y , et al . BAMBOO: a comprehensive benchmark for evaluating long text modeling capacities of large language models [J ] . arXiv Preprint , arXiv: 2309.13345 , 2023 .

Neo 4 j. Neo 4 j[EB/OL ] . ( 2024 )[ 2024-08-10 ] .

The Graphviz Authors . Graphviz [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

FreeBuf . FreeBuf [EB/OL ] . ( 2024 )[ 2024-08-10 ] .

Views

1400

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Research on network configuration analysis technology empowered by large language models

Temporal accumulation-driven intelligent spectrum sensing method under low-SNR conditions

Network behavior twin-driven traffic anomaly detection for the Internet of things

Cooperative optimization method for inference on multi-chiplet large-model accelerators

Related Author

Li Pengfei

Liu Yujing

Su Jinshu

Yu Bo

Zhang Luxin

Zheng Shilian

Qi Peihan

Yang Xiaoniu

Related Institution

College of Computer Science and Technology, National University of Defense Technology

College of Military Intelligence, Academy of Military Science

National Key Laboratory of Electromagnetic Space Security

School of Telecommunications Engineering, Xidian University

School of Internet of Things, Nanjing University of Posts and Telecommunications

AI问答

⁰