面向高速网络流量的恶意镜像网站识别方法

张蕾; 张鹏; 孙伟; 杨兴东; 邢丽超

doi:10.11959/j.issn.1000-436x.2019089

您当前的位置：

首页 >

文章列表页 >

面向高速网络流量的恶意镜像网站识别方法

学术论文 | 更新时间：2024-06-05

- 面向高速网络流量的恶意镜像网站识别方法
- IMM4HT:an identification method of malicious mirror website for high-speed network traffic
- 通信学报 2019年40卷第7期页码：87-94
- 作者机构：
  
  1. 中国科学院大学网络空间安全学院，北京 100049
  2. 中国科学院信息工程研究所，北京 100093
  3. 北京交通大学计算机与信息技术学院，北京 100044
  4. 北京航空航天大学计算机学院，北京 100191
- 作者简介：
  
  [ "张蕾（1996- ），女，四川广元人，中国科学院大学博士生，主要研究方向为信息过滤与内容计算及网络安全。" ]
  [ "张鹏（1984- ），男，安徽淮南人，博士，中国科学院信息工程研究所研究员，主要研究方向为分布式系统和数据挖掘及网络安全。" ]
  [ "孙伟（1980- ），男，山西宁武人，北京交通大学博士生，主要研究方向为计算机网络、信息安全和网络测量。" ]
  [ "杨兴东（1994- ），男，河北张家口人，北京航空航天大学硕士生，主要研究方向为网络流数据处理及网络空间安全。" ]
  [ "邢丽超（1993- ），男，黑龙江哈尔滨人，中国科学院大学硕士生，主要研究方向为信息过滤与内容计算。" ]
- 基金信息：
  
  国家重点研究发展计划基金资助项目(2016YFB0801300);国家自然科学基金资助项目(61602474);国家自然科学基金资助项目(61602467);国家自然科学基金资助项目(61702552)
- DOI：10.11959/j.issn.1000-436x.2019089
  中图分类号： TP309
- 网络出版日期：2019-07，
  
  纸质出版日期：2019-07-25
- 稿件说明：
移动端阅览
张蕾, 张鹏, 孙伟, 等. 面向高速网络流量的恶意镜像网站识别方法[J]. 通信学报, 2019,40(7):87-94.

Lei ZHANG, Peng ZHANG, Wei SUN, et al. IMM4HT:an identification method of malicious mirror website for high-speed network traffic[J]. Journal on communications, 2019, 40(7): 87-94.
张蕾, 张鹏, 孙伟, 等. 面向高速网络流量的恶意镜像网站识别方法[J]. 通信学报, 2019,40(7):87-94. DOI： 10.11959/j.issn.1000-436x.2019089.

Lei ZHANG, Peng ZHANG, Wei SUN, et al. IMM4HT:an identification method of malicious mirror website for high-speed network traffic[J]. Journal on communications, 2019, 40(7): 87-94. DOI： 10.11959/j.issn.1000-436x.2019089.

摘要

针对网络环境中造成危害的信息通过镜像网站进行传播从而绕过检查的问题，提出了面向高速网络流量的恶意镜像网站识别方法。首先，从流量中提取碎片化数据并且还原网页源码，同时加入标准化处理来提高识别准确率；然后，将网页源码分块，利用相似度散列算法对每个网页源码分块计算散列值，得到网页源码的相似度散列值，同时引入海明距离来计算网页源码之间的相似性；最后，截取网页快照，提取其 SIFT 特征点，通过聚类分析和映射处理得到网页快照的感知散列值，通过感知散列值计算网页相似性。在真实流量下的实验表明，所提方法的准确率为93.42%，召回率为90.20%，F值为0.92，处理时延为20 μs。通过所提方法，在高速网络流量下可以有效地检测恶意镜像网页。

Abstract

Aiming at the problem that some information causing harm to the network environment was transmitted through the mirror website so as to bypass the detection

an identification method of malicious mirror website for high-speed network traffic was proposed.At first

fragmented data from the traffic was extracted

and the source code of the webpage was restored.Next

a standardized processing module was utilized to improve the accuracy.Additionally

the source code of the webpage was divided into blocks

and the hash value of each block was calculated by the simhash algorithm.Therefore

the simhash value of the webpage source codes was obtained

and the similarity between the webpage source codes was calculated by the Hamming distance.The page snapshot was then taken and SIFT feature points were extracted.The perceptual hash value was obtained by clustering analysis and mapping processing.Finally

the similarity of webpages was calculated by the perceptual hash values.Experiments under real traffic show that the accuracy of the method is 93.42%

the recall rate is 90.20%

the F value is 0.92

and the processing delay is 20 μs.Through the proposed method

malicious mirror website can be effectively detected in the high-speed network traffic environment.

关键词

Keywords

references

CINIC . The 41st China statistical report on internet development [R ] . Beijing:China Internet Network Information Center , 2018 .

QIN Z , YAN J , REN K , et al . SecSIFT:secure image SIFT feature extraction in cloud computing [J ] . ACM Transactions on Multimedia Computing,Communications and Applications , 2016 , 12 ( 4s ):65.

GOMEZ-NIETO E , SAN ROMAN F , PAGLIOSA P , et al . Similarity preserving snippet-based visualization of Web search results [J ] . IEEE Transactions on Visualization and Computer Graphics , 2014 , 20 ( 3 ): 457 - 470 .

HOFMANN T , . Probabilistic latent semantic indexing [C ] // ACM SIGIR Forum . ACM , 2017 : 211 - 218 .

SADOWSKI C , LEVIN G . Simhash:hash-based similarity detection [R ] . Google , 2007 .

KOŁCZ A , CHOWDHURY A . Lexicon randomization for near-duplicate detection with I-match [J ] . The Journal of Supercomputing , 2008 , 45 ( 3 ): 255 - 276 .

SHIVAKUMAR N , GARCIA-MOLINA H , . Finding near-replicas of documents on the Web [C ] // International Workshop on the World Wide Web and Databases . Springer , 1998 : 204 - 212 .

KAPOOR A , ARORA V . Application of bloom filter for duplicate URL detection in a web crawler [C ] // IEEE International Conference on Collaboration and Internet Computing . IEEE , 2016 : 246 - 255 .

JIANG J , CHEN J , CHOO K K R , et al . A deep learning based online malicious URL and DNS detection scheme [C ] // International Conference on Security and Privacy in Communication Systems . Springer , 2017 : 438 - 448 .

LIU W , DENG X , HUANG G , et al . An antiphishing strategy based on visual similarity assessment [J ] . IEEE Internet Computing , 2006 , 10 ( 2 ): 58 - 65 .

MAO J , TIAN W , LI P , et al . Phishing-alarm:robust and efficient phishing detection via page component similarity [J ] . IEEE Access , 2017 ( 5 ): 17020 - 17030 .

ZHANG H , LIU G , CHOW T W S , et al . Textual and visual content-based anti-phishing:a Bayesian approach [J ] . IEEE Transactions on Neural Networks , 2011 , 22 ( 10 ): 1532 - 1546 .

CHEN Z , ZHANG P , ZHENG C , et al . CookieMiner:towards real-time reconstruction of web-downloading chains from network traces [C ] // IEEE International Conference on Communications . IEEE , 2016 : 1 - 6 .

BOBBARJUNG D R , JAGANNATHAN S , DUBNICKI C . Improving duplicate elimination in storage systems [J ] . ACM Transactions on Storage , 2006 , 2 ( 4 ): 424 - 448 .

MIKOLAJCZYK K , SCHMID C . A performance evaluation of local descriptors [J ] . IEEE transactions on pattern analysis and machine intelligence , 2005 , 27 ( 10 ): 1615 - 1630 .

浏览量

1198

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据