浏览全部资源
扫码关注微信
1. 燕山大学信息科学与工程学院,河北 秦皇岛 066004
2. 河北省软件工程重点实验室,河北 秦皇岛 066004
[ "张炳(1989− ),男,湖北黄冈人,博士,燕山大学副教授、硕士生导师,主要研究方向为数据挖掘、机器学习、软件安全" ]
[ "文峥(1998− ),男,河北保定人,燕山大学硕士生,主要研究方向为软件安全" ]
[ "赵宇轩(1997− ),男,河北秦皇岛人,燕山大学硕士生,主要研究方向为文本挖掘、软件安全" ]
[ "王苧(1994− ),女,山西阳泉人,燕山大学硕士生,主要研究方向为软件安全" ]
[ "任家东(1967− ),男,黑龙江齐齐哈尔人,博士,燕山大学教授、博士生导师,主要研究方向为时态数据建模、软件安全" ]
网络出版日期:2021-11,
纸质出版日期:2021-11-25
移动端阅览
张炳, 文峥, 赵宇轩, 等. 双粒度轻量级漏洞代码切片方法评估模型[J]. 通信学报, 2021,42(11):233-241.
Bing ZHANG, Zheng WEN, Yuxuan ZHAO, et al. Dual-granularity lightweight model for vulnerability code slicing method assessment[J]. Journal on communications, 2021, 42(11): 233-241.
张炳, 文峥, 赵宇轩, 等. 双粒度轻量级漏洞代码切片方法评估模型[J]. 通信学报, 2021,42(11):233-241. DOI: 10.11959/j.issn.1000-436x.2021196.
Bing ZHANG, Zheng WEN, Yuxuan ZHAO, et al. Dual-granularity lightweight model for vulnerability code slicing method assessment[J]. Journal on communications, 2021, 42(11): 233-241. DOI: 10.11959/j.issn.1000-436x.2021196.
针对现有漏洞代码切片方法评估过程存在的切片信息抽取不完全、模型复杂度高且泛化能力差、评估过程开环无反馈的问题,提出了一种双粒度轻量级漏洞代码切片方法评估模型(VCSE)。针对代码片段,构建了轻量级的TF-IDF与N-gram融合模型,高效绕过了OOV问题,并基于词、字符双粒度提取了代码切片语义及统计特征,设计了高精确率与泛化性能的异质集成分类器,进行漏洞预测分析。实验结果表明,轻量级VCSE的评估效果明显优于当前应用广泛的深度学习模型。
Aiming at the problems existing in the assessment of existing vulnerability code slicing method
such as incomplete extraction of slicing information
high model complexity and poor generalization ability
and no feedback in the evaluation process
a dual-granularity lightweight vulnerability code slicing evaluation (VCSE) model was proposed.Aiming at the code snippet
a lightweight fusion model of TF-IDF and N-gram was constructed
which bypassed the OOV problem efficiently
and the semantic and statistical features of code slices were extracted based on the double granularity of words and characters.A heterogeneous integrated classifier with high accuracy and generalization performance was designed for vulnerability prediction and analysis.The experimental results show that the evaluation effect of lightweight VCSE is obviously better than that of the current widely used deep learning model.
LIN G J , WEN S , HAN Q L , et al . Software vulnerability detection using deep neural networks:a survey [J ] . Proceedings of the IEEE , 2020 , 108 ( 10 ): 1825 - 1848 .
李珍 , 邹德清 , 王泽丽 , 等 . 面向源代码的软件漏洞静态检测综述 [J ] . 网络与信息安全学报 , 2019 , 5 ( 1 ): 1 - 14 .
LI Z , ZOU D Q , WANG Z L , et al . Survey on static software vulnerability detection for source code [J ] . Chinese Journal of Network and Information Security , 2019 , 5 ( 1 ): 1 - 14 .
RAMOS U J . Using tf-idf to determine word relevance in document queries [J ] . Proceedings of the First Instructional Conference on Machine Learning , 2003 , 242 : 133 - 142 .
李韵 , 黄辰林 , 王中锋 , 等 . 基于机器学习的软件漏洞挖掘方法综述 [J ] . 软件学报 , 2020 , 31 ( 7 ): 2040 - 2061 .
LI Y , HUANG C L , WANG Z F , et al . Survey of software vulnerability mining methods based on machine learning [J ] . Journal of Software , 2020 , 31 ( 7 ): 2040 - 2061 .
PETERS M E , NEUMANN M , IYYER M , et al . Deep contextualized word representations [J ] . arXiv Preprint,arXiv:1802.05365 , 2018 .
DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv Preprint,arXiv:1810.04805 , 2018 .
BURATTI L , PUJAR S , BORNEA M , et al . Exploring software naturalness through neural language models [J ] . arXiv Preprint,arXiv:2006.12641 , 2020 .
KARAMPATSIS R M , BABII H , ROBBES R , et al . Big code != big vocabulary:open-vocabulary models for source code [C ] // Proceedings of Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering . New York:ACM Press , 2020 : 1073 - 1085 .
BROWN P F , DELLA PIETRA V J , SOUZA P V , et al . Class-based N-gram models of natural language [J ] . Computational Linguistics , 1992 , 18 ( 4 ): 467 - 479 .
DIETTERICH T G . Ensemble learning [J ] . The Handbook of Brain Theory and Neural Networks , 2002 , 2 ( 1 ): 110 - 125 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . arXiv Preprint,arXiv:1301.3781 , 2013 .
FENG Z Y , GUO D Y , TANG D Y , et al . Codebert:a pre-trained model for programming and natural languages [J ] . arXiv Preprint,arXiv:2002.08155 , 2020 .
GUO D Y , REN S , LU S , et al . GraphCodeBERT:pre-training code representations with data flow [J ] . arXiv Preprint,arXiv:2009.08366 , 2020 .
SALIMI S , EBRAHIMZADEH M , KHARRAZI M . Improving real-world vulnerability characterization with vulnerable slices [C ] // Proceedings of Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering . New York:ACM Press , 2020 : 11 - 20 .
LI Z , ZOU D Q , XU S H , et al . VulDeePecker:a deep learning-based system for vulnerability detection [J ] . arXiv Preprint,arXiv:1801.01681 , 2018 .
ZOU D Q , WANG S J , XU S H , et al . μVulDeePecker:a deep learning-based system for multiclass vulnerability detection [J ] . IEEE Transactions on Dependable and Secure Computing , 2021 , 18 ( 5 ): 2224 - 2236 .
LI Z , ZOU D Q , XU S H , et al . S y SeVR:a framework for using deep learning to detect software vulnerabilities [J ] . IEEE Transactions on Dependable and Secure Computing , 2021 , PP ( 99 ): 1 .
CHOWDHURY I , ZULKERNINE M . Using complexity,coupling,and cohesion metrics as early indicators of vulnerabilities [J ] . Journal of Systems Architecture , 2011 , 57 ( 3 ): 294 - 313 .
MOU L L , LI G , ZHANG L , et al . Convolutional neural networks over tree structures for programming language processing [J ] . arXiv Preprint,arXiv:1409.5718 , 2014 .
ZHOU Y , LIU S , SIOW J , et al . Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks [J ] . arXiv Preprint,arXiv:1909.03496 , 2019 .
HINDLE A , BARR E T , GABEL M , et al . On the naturalness of software [J ] . Communications of the ACM , 2016 , 59 ( 5 ): 122 - 131 .
SCANDARIATO R , WALDEN J , HOVSEPYAN A , et al . Predicting vulnerable software components via text mining [J ] . IEEE Transactions on Software Engineering , 2014 , 40 ( 10 ): 993 - 1006 .
PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg:Association for Computational Linguistics , 2014 : 1532 - 1543 .
0
浏览量
331
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构