双粒度轻量级漏洞代码切片方法评估模型

张炳; 文峥; 赵宇轩; 王苧; 任家东

doi:10.11959/j.issn.1000-436x.2021196

您当前的位置：

首页 >

文章列表页 >

双粒度轻量级漏洞代码切片方法评估模型

学术通信 | 更新时间：2024-06-05

- 双粒度轻量级漏洞代码切片方法评估模型
- Dual-granularity lightweight model for vulnerability code slicing method assessment
- 通信学报 2021年42卷第11期页码：233-241
- 作者机构：
  
  1. 燕山大学信息科学与工程学院，河北秦皇岛 066004
  2. 河北省软件工程重点实验室，河北秦皇岛 066004
- 作者简介：
  
  [ "张炳（1989− ），男，湖北黄冈人，博士，燕山大学副教授、硕士生导师，主要研究方向为数据挖掘、机器学习、软件安全" ]
  [ "文峥（1998− ），男，河北保定人，燕山大学硕士生，主要研究方向为软件安全" ]
  [ "赵宇轩（1997− ），男，河北秦皇岛人，燕山大学硕士生，主要研究方向为文本挖掘、软件安全" ]
  [ "王苧（1994− ），女，山西阳泉人，燕山大学硕士生，主要研究方向为软件安全" ]
  [ "任家东（1967− ），男，黑龙江齐齐哈尔人，博士，燕山大学教授、博士生导师，主要研究方向为时态数据建模、软件安全" ]
- 基金信息：
  
  国家自然科学基金资助项目(61802332);国家自然科学基金资助项目(61807028);国家自然科学基金资助项目(61772449);燕山大学博士基金资助项目(BL18012)
- DOI：10.11959/j.issn.1000-436x.2021196
  中图分类号： TP309
- 网络出版日期：2021-11，
  
  纸质出版日期：2021-11-25
- 稿件说明：
移动端阅览
张炳, 文峥, 赵宇轩, 等. 双粒度轻量级漏洞代码切片方法评估模型[J]. 通信学报, 2021,42(11):233-241.

Bing ZHANG, Zheng WEN, Yuxuan ZHAO, et al. Dual-granularity lightweight model for vulnerability code slicing method assessment[J]. Journal on communications, 2021, 42(11): 233-241.
张炳, 文峥, 赵宇轩, 等. 双粒度轻量级漏洞代码切片方法评估模型[J]. 通信学报, 2021,42(11):233-241. DOI： 10.11959/j.issn.1000-436x.2021196.

Bing ZHANG, Zheng WEN, Yuxuan ZHAO, et al. Dual-granularity lightweight model for vulnerability code slicing method assessment[J]. Journal on communications, 2021, 42(11): 233-241. DOI： 10.11959/j.issn.1000-436x.2021196.

摘要

针对现有漏洞代码切片方法评估过程存在的切片信息抽取不完全、模型复杂度高且泛化能力差、评估过程开环无反馈的问题，提出了一种双粒度轻量级漏洞代码切片方法评估模型（VCSE）。针对代码片段，构建了轻量级的TF-IDF与N-gram融合模型，高效绕过了OOV问题，并基于词、字符双粒度提取了代码切片语义及统计特征，设计了高精确率与泛化性能的异质集成分类器，进行漏洞预测分析。实验结果表明，轻量级VCSE的评估效果明显优于当前应用广泛的深度学习模型。

Abstract

Aiming at the problems existing in the assessment of existing vulnerability code slicing method

such as incomplete extraction of slicing information

high model complexity and poor generalization ability

and no feedback in the evaluation process

a dual-granularity lightweight vulnerability code slicing evaluation (VCSE) model was proposed.Aiming at the code snippet

a lightweight fusion model of TF-IDF and N-gram was constructed

which bypassed the OOV problem efficiently

and the semantic and statistical features of code slices were extracted based on the double granularity of words and characters.A heterogeneous integrated classifier with high accuracy and generalization performance was designed for vulnerability prediction and analysis.The experimental results show that the evaluation effect of lightweight VCSE is obviously better than that of the current widely used deep learning model.

关键词

Keywords

references

LIN G J , WEN S , HAN Q L , et al . Software vulnerability detection using deep neural networks:a survey [J ] . Proceedings of the IEEE , 2020 , 108 ( 10 ): 1825 - 1848 .

李珍 , 邹德清 , 王泽丽 , 等 . 面向源代码的软件漏洞静态检测综述 [J ] . 网络与信息安全学报 , 2019 , 5 ( 1 ): 1 - 14 .

LI Z , ZOU D Q , WANG Z L , et al . Survey on static software vulnerability detection for source code [J ] . Chinese Journal of Network and Information Security , 2019 , 5 ( 1 ): 1 - 14 .

RAMOS U J . Using tf-idf to determine word relevance in document queries [J ] . Proceedings of the First Instructional Conference on Machine Learning , 2003 , 242 : 133 - 142 .

李韵 , 黄辰林 , 王中锋 , 等 . 基于机器学习的软件漏洞挖掘方法综述 [J ] . 软件学报 , 2020 , 31 ( 7 ): 2040 - 2061 .

LI Y , HUANG C L , WANG Z F , et al . Survey of software vulnerability mining methods based on machine learning [J ] . Journal of Software , 2020 , 31 ( 7 ): 2040 - 2061 .

PETERS M E , NEUMANN M , IYYER M , et al . Deep contextualized word representations [J ] . arXiv Preprint,arXiv:1802.05365 , 2018 .

DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv Preprint,arXiv:1810.04805 , 2018 .

BURATTI L , PUJAR S , BORNEA M , et al . Exploring software naturalness through neural language models [J ] . arXiv Preprint,arXiv:2006.12641 , 2020 .

KARAMPATSIS R M , BABII H , ROBBES R , et al . Big code != big vocabulary:open-vocabulary models for source code [C ] // Proceedings of Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering . New York:ACM Press , 2020 : 1073 - 1085 .

BROWN P F , DELLA PIETRA V J , SOUZA P V , et al . Class-based N-gram models of natural language [J ] . Computational Linguistics , 1992 , 18 ( 4 ): 467 - 479 .

DIETTERICH T G . Ensemble learning [J ] . The Handbook of Brain Theory and Neural Networks , 2002 , 2 ( 1 ): 110 - 125 .

MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . arXiv Preprint,arXiv:1301.3781 , 2013 .

FENG Z Y , GUO D Y , TANG D Y , et al . Codebert:a pre-trained model for programming and natural languages [J ] . arXiv Preprint,arXiv:2002.08155 , 2020 .

GUO D Y , REN S , LU S , et al . GraphCodeBERT:pre-training code representations with data flow [J ] . arXiv Preprint,arXiv:2009.08366 , 2020 .

SALIMI S , EBRAHIMZADEH M , KHARRAZI M . Improving real-world vulnerability characterization with vulnerable slices [C ] // Proceedings of Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering . New York:ACM Press , 2020 : 11 - 20 .

LI Z , ZOU D Q , XU S H , et al . VulDeePecker:a deep learning-based system for vulnerability detection [J ] . arXiv Preprint,arXiv:1801.01681 , 2018 .

ZOU D Q , WANG S J , XU S H , et al . μVulDeePecker:a deep learning-based system for multiclass vulnerability detection [J ] . IEEE Transactions on Dependable and Secure Computing , 2021 , 18 ( 5 ): 2224 - 2236 .

LI Z , ZOU D Q , XU S H , et al . S y SeVR:a framework for using deep learning to detect software vulnerabilities [J ] . IEEE Transactions on Dependable and Secure Computing , 2021 , PP ( 99 ): 1 .

CHOWDHURY I , ZULKERNINE M . Using complexity,coupling,and cohesion metrics as early indicators of vulnerabilities [J ] . Journal of Systems Architecture , 2011 , 57 ( 3 ): 294 - 313 .

MOU L L , LI G , ZHANG L , et al . Convolutional neural networks over tree structures for programming language processing [J ] . arXiv Preprint,arXiv:1409.5718 , 2014 .

ZHOU Y , LIU S , SIOW J , et al . Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks [J ] . arXiv Preprint,arXiv:1909.03496 , 2019 .

HINDLE A , BARR E T , GABEL M , et al . On the naturalness of software [J ] . Communications of the ACM , 2016 , 59 ( 5 ): 122 - 131 .

SCANDARIATO R , WALDEN J , HOVSEPYAN A , et al . Predicting vulnerable software components via text mining [J ] . IEEE Transactions on Software Engineering , 2014 , 40 ( 10 ): 993 - 1006 .

PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg:Association for Computational Linguistics , 2014 : 1532 - 1543 .

浏览量

331

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向医疗数据分享的轻量级且安全的搜索方案

V2G中基于PUF的轻量级匿名认证协议

融合多尺度深度卷积的轻量级Transformer交通场景语义分割算法

面向轻量级物联网设备的高效匿名身份认证协议设计

轻量级可搜索医疗数据共享方案