Program semantic analysis model for code reuse detection
Papers|更新时间:2025-01-14
|
Program semantic analysis model for code reuse detection
Journal on CommunicationsVol. 45, Issue 12, Pages: 179-196(2024)
作者机构:
1.华中农业大学信息学院,湖北 武汉430070
2.湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室,湖北 武汉 430068
作者简介:
基金信息:
The National Natural Science Foundation of China(61502194);The National Key Research and Development Program of China(2023YFF1000100);Science and Technology Research Project of Hubei Provincial Department of Education(Q20211405);Doctoral Research Initiation Fund Project of Hubei University of Technology(XJ2021003601)
GUO Xi,WANG Pan.Program semantic analysis model for code reuse detection[J].Journal on Communications,2024,45(12):179-196. DOI: 10.11959/j.issn.1000-436x.2024269.
Program semantic analysis model for code reuse detection
Program similarity analysis had a wide range of applications in areas such as code plagiarism and property protection
but it generally suffered from problems such as excessive computational overhead
a code similarity analysis method based on fuzzy matching and statistical inference was proposed. For binary programs
first disassembly analysis was performed and then function boundary recognition operations was performed to extract the execution boundary information of the function. On this basis
dynamic programming analysis methods were used to obtain similarity results between basic blocks at the granularity of the basic blocks
and neighborhood search was performed on the basis of the control flow graph to extend similarity analysis from the basic block level to the function level. Finally
the semantic similarity of binary files was obtained through statistical analysis of similarity functions. During this process
the pre trained model was optimized and analyzed
and the parameters were tuned to enable similarity analysis of cross platform code. The experimental results show that the proposed method has a significant improvement in analysis accuracy compared to traditional analysis tools
with an average increase of 7.1% in analysis accuracy compared to current mainstream analysis tools.
CHEN J F , WANG Z X , CAI S H , et al . Vulnerability detection method for blockchain smart contracts based on metamorphic testing [J ] . Journal on Communications , 2023 , 44 ( 10 ): 164 - 176 .
WANG J W , CHEN Z J , XIE X , et al . Deep visualization classification method for malicious code based on Ngram-TFIDF [J ] . Journal on Communications , 2024 , 45 ( 6 ): 160 - 175 .
LIN W , GUO Q L , YIN J W , et al . FSmell: recognizing inline function in binary code [C ] // Proceedings of the European Symposium on Research in Computer Security . Berlin : Springer , 2024 : 487 - 506 .
KIM S , KIM H , CHA S K . FunProbe: probing functions from binary code through probabilistic analysis [C ] // Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM Press , 2023 : 1419 - 1430 .
YU S , QU Y , HU X C , et al . DeepDi: learning a relational graph convolutional network model on instructions for fast and accurate disassembly [C ] // Proceedings of the USENIX Security Symposium . Berkeley : USENIX Association , 2022 : 2709 - 2725 .
LIU B C , HUO W , ZHANG C , et al . αDiff: cross-version binary code similarity detection with DNN [C ] // Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering . New York : ACM Press , 2018 : 667 - 678 .
FENG Q , ZHOU R D , XU C C , et al . Scalable graph-based bug search for firmware images [C ] // Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM Press , 2016 : 480 - 491 .
DEVLIN J , CHANG M W , LEE K , et al . BERT: pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Association for Computational Linguistics : Minnesota . 2019 : 4171 - 4186 .
YU Z P , CAO R , TANG Q Y , et al . Order matters: semantic-aware neural networks for binary code similarity detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 1 ): 1145 - 1152 .
WANG H , QU W J , KATZ G , et al . jTrans: jump-aware transformer for binary code similarity detection [C ] // Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM Press , 2022 : 1 - 13 .
DING S H H , FUNG B C M , CHARLAND P . Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization [C ] // Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE Press , 2019 : 472 - 489 .
ZHANG X C , SUN W J , PANG J M , et al . Similarity metric method for binary basic blocks of cross-instruction set architecture [C ] // Proceedings 2020 Workshop on Binary Analysis Research . Reston : Internet Society , 2020 .
YANG S G , CHENG L , ZENG Y C , et al . Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection [C ] // Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) . Piscataway : IEEE Press , 2021 : 224 - 236 .
YANG J , FU C , LIU X Y , et al . Codee: a tensor embedding scheme for binary code search [J ] . IEEE Transactions on Software Engineering , 2022 , 48 ( 7 ): 2224 - 2244 .
DAVID Y , PARTUSH N , YAHAV E . Statistical similarity of binaries [J ] . ACM SIGPLAN Notices , 2016 , 51 ( 6 ): 266 - 280 .
BAO T , BURKET J , WOO M , et al . BYTEWEIGHT: learning to recognize functions in binary code [C ] // Proceedings of USENIX Security Symposium , 2014 : 845 - 860 .
HUANG H , YOUSSEF A M , DEBBABI M . BinSequence: fast, accurate and scalable binary code reuse detection [C ] // Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security . New York : ACM Press , 2017 : 155 - 166 .
PEI K X , XUAN Z , YANG J F , et al . Trex: learning execution semantics from micro-traces for binary similarity [J ] . arXiv Preprint , arXiv: 2012.08680 , 2020 .