LAI Qingnan,JIN Jiandong,ZHOU Changling.Research on knowledge graph construction technology for cyber threat intelligence based on large language models[J].Journal on Communications,2024,45(Z2):33-43.
LAI Qingnan,JIN Jiandong,ZHOU Changling.Research on knowledge graph construction technology for cyber threat intelligence based on large language models[J].Journal on Communications,2024,45(Z2):33-43. DOI: 10.11959/j.issn.1000-436x.2024225.
Research on knowledge graph construction technology for cyber threat intelligence based on large language models
As the complexity and sophistication of cyber threats continue to increase
integrating cyber threat intelligence into cybersecurity measures has become crucial. A framework called AutoCTI2KG was proposed
which was based on large language models for constructing cyber threat intelligence knowledge graphs. Through instruction prompts and context learning
AutoCTI2KG automatically generated cybersecurity and attack knowledge graphs from cyber threat intelligence and provided actionable defense recommendations. Experimental results show that the proposed framework performs excellently in constructing cybersecurity and attack knowledge graphs
with F1 scores around 0.90
demonstrating the potential of large language models in knowledge graph construction in the cybersecurity domain. This work not only advances the frontier of cybersecurity knowledge graph construction but also provides a practical tool for cybersecurity professionals to better understand and mitigate cyber risks.
DING Z Y , LIU K , LIU B , et al . Survey of cyber security knowledge graph [J ] . Journal of Huazhong University of Science and Technology (Natural Science Edition) , 2021 , 49 ( 7 ): 79 - 91 .
CNVD . China National Vulnerability Database (CNVD) [EB/OL ] . ( 2024 )[ 2024-08-10 ] .
DEVLIN J , CHANG M W , LEE K , et al . BERT: pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Stroudsburg : ACL Press 2019 : 4171 - 4186 .
RADFORD A , WU J , CHILD R , et al . Language models are unsupervised multitask learners [EB/OL ] . ( 2019 )[ 2024-08-10 ] .
BROWN T , MANN B , RYDER N , et al . Language models are few-shot learners [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 1877 - 1901 .
OpenAI . ChatGPT: optimizing language models for dialogue [EB/OL ] . ( 2022 )[ 2024-08-10 ] .
OpenAI , ACHIAM J , ADLER S , et al . GPT-4 technical report [J ] . arXiv Preprint , arXiv: 2303 . 08774 v 6 , 2023 .
CHE W X , DOU Z C , FENG Y S , et al . Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future directions [J ] . Scientia Sinica (Informationis) , 2023 , 53 ( 9 ): 1645 - 1687 .
HUANG B , WU S A , WANG W G , et al . KG-LLM-MCom: a survey on integration of knowledge graph and large language model [J ] . Journal of Wuhan University (Natural Science Edition) , 2024 , 70 ( 4 ): 397 - 412 .
SHI H Y , WEI J X , CAI X Y , et al . Research on threat intelligence extraction and knowledge graph construction technology [J ] . Journal of Xidian University , 2023 , 50 ( 4 ): 65 - 75 .
HUANG Z Y , YU Y N , LIN R M , et al . Knowledge graph construction for network security base on modified BiLSTM-CRF [J ] . Modern Electronics Technique , 2024 , 47 ( 6 ): 15 - 21 .
TANG S Y , LI S F , ZHANG L J . Research on the construction of cyber security knowledge graph based on Neo4j [J ] . Information Security and Communications Privacy , 2022 , 20 ( 8 ): 60 - 70 .
WANG X D , HUANG C , LIU J Y . A survey of cyber security open-source intelligence knowledge graph [J ] . Netinfo Security , 2023 , 23 ( 6 ): 11 - 21 .
GAO P , LIU X Y , CHOI E , et al . ThreatKG: an AI-powered system for automated open-source cyber threat intelligence gathering and management [J ] . arXiv Preprint , arXiv: 2212 . 10388 v 2 , 2022 .
LI Z Y , ZENG J , CHEN Y , et al . AttacKG: constructing technique knowledge graph from Cyber threat intelligence reports [C ] // Proceedings of Lecture Notes in Computer Science . Cham : Springer International Publishing , 2022 : 589 - 609 .
ZHANG Y H , DU T W , MA Y S , et al . AttacKG+: boosting attack knowledge graph construction with large language models [J ] . arXiv Preprint , arXiv: 2405 . 04753 v 1 , 2024 .
AGRAWAL M , HEGSELMANN S , LANG H , et al . Large language models are few-shot clinical information extractors [J ] . arXiv Preprint , arXiv: 2205 . 12689 v 2 , 2022 .
WEI X , CUI X Y , CHENG N , et al . ChatIE: zero-shot information extraction via chatting with ChatGPT [J ] . arXiv Preprint , arXiv: 2302 . 10205 v 2 , 2023 .
POLAK M P , MORGAN D . Extracting accurate materials data from research papers with conversational language models and prompt engineering [J ] . Nature Communications , 2024 , 15 ( 1 ): 1569 .
CHEN B H , BERTOZZI A L . AutoKG: efficient automated knowledge graph generation for language models [C ] // Proceedings of the 2023 IEEE International Conference on Big Data (BigData) . Piscataway : IEEE Press , 2023 : 3117 - 3126 .
PAN S R , LUO L H , WANG Y F , et al . Unifying large language models and knowledge graphs: a roadmap [J ] . IEEE Transactions on Knowledge and Data Engineering , 2024 , 36 ( 7 ): 3580 - 3599 .
LI J Q , WANG M M , ZHENG Z L , et al . LooGLE: can long-context language models understand long contexts? [J ] . arXiv Preprint , arXiv: 2311 . 04939 v 2 , 2023 .
DONG Z C , TANG T Y , LI J Y , et al . BAMBOO: a comprehensive benchmark for evaluating long text modeling capacities of large language models [J ] . arXiv Preprint , arXiv: 2309.13345 , 2023 .