浏览全部资源
扫码关注微信
1. 中国科学院大学网络空间安全学院,北京 100049
2. 中国科学院信息工程研究所,北京 100093
3. 北京交通大学计算机与信息技术学院,北京 100044
4. 北京航空航天大学计算机学院,北京 100191
[ "张蕾(1996- ),女,四川广元人,中国科学院大学博士生,主要研究方向为信息过滤与内容计算及网络安全。" ]
[ "张鹏(1984- ),男,安徽淮南人,博士,中国科学院信息工程研究所研究员,主要研究方向为分布式系统和数据挖掘及网络安全。" ]
[ "孙伟(1980- ),男,山西宁武人,北京交通大学博士生,主要研究方向为计算机网络、信息安全和网络测量。" ]
[ "杨兴东(1994- ),男,河北张家口人,北京航空航天大学硕士生,主要研究方向为网络流数据处理及网络空间安全。" ]
[ "邢丽超(1993- ),男,黑龙江哈尔滨人,中国科学院大学硕士生,主要研究方向为信息过滤与内容计算。" ]
网络出版日期:2019-07,
纸质出版日期:2019-07-25
移动端阅览
张蕾, 张鹏, 孙伟, 等. 面向高速网络流量的恶意镜像网站识别方法[J]. 通信学报, 2019,40(7):87-94.
Lei ZHANG, Peng ZHANG, Wei SUN, et al. IMM4HT:an identification method of malicious mirror website for high-speed network traffic[J]. Journal on communications, 2019, 40(7): 87-94.
张蕾, 张鹏, 孙伟, 等. 面向高速网络流量的恶意镜像网站识别方法[J]. 通信学报, 2019,40(7):87-94. DOI: 10.11959/j.issn.1000-436x.2019089.
Lei ZHANG, Peng ZHANG, Wei SUN, et al. IMM4HT:an identification method of malicious mirror website for high-speed network traffic[J]. Journal on communications, 2019, 40(7): 87-94. DOI: 10.11959/j.issn.1000-436x.2019089.
针对网络环境中造成危害的信息通过镜像网站进行传播从而绕过检查的问题,提出了面向高速网络流量的恶意镜像网站识别方法。首先,从流量中提取碎片化数据并且还原网页源码,同时加入标准化处理来提高识别准确率;然后,将网页源码分块,利用相似度散列算法对每个网页源码分块计算散列值,得到网页源码的相似度散列值,同时引入海明距离来计算网页源码之间的相似性;最后,截取网页快照,提取其 SIFT 特征点,通过聚类分析和映射处理得到网页快照的感知散列值,通过感知散列值计算网页相似性。在真实流量下的实验表明,所提方法的准确率为93.42%,召回率为90.20%,F值为0.92,处理时延为20 μs。通过所提方法,在高速网络流量下可以有效地检测恶意镜像网页。
Aiming at the problem that some information causing harm to the network environment was transmitted through the mirror website so as to bypass the detection
an identification method of malicious mirror website for high-speed network traffic was proposed.At first
fragmented data from the traffic was extracted
and the source code of the webpage was restored.Next
a standardized processing module was utilized to improve the accuracy.Additionally
the source code of the webpage was divided into blocks
and the hash value of each block was calculated by the simhash algorithm.Therefore
the simhash value of the webpage source codes was obtained
and the similarity between the webpage source codes was calculated by the Hamming distance.The page snapshot was then taken and SIFT feature points were extracted.The perceptual hash value was obtained by clustering analysis and mapping processing.Finally
the similarity of webpages was calculated by the perceptual hash values.Experiments under real traffic show that the accuracy of the method is 93.42%
the recall rate is 90.20%
the F value is 0.92
and the processing delay is 20 μs.Through the proposed method
malicious mirror website can be effectively detected in the high-speed network traffic environment.
CINIC . The 41st China statistical report on internet development [R ] . Beijing:China Internet Network Information Center , 2018 .
QIN Z , YAN J , REN K , et al . SecSIFT:secure image SIFT feature extraction in cloud computing [J ] . ACM Transactions on Multimedia Computing,Communications and Applications , 2016 , 12 ( 4s ):65.
GOMEZ-NIETO E , SAN ROMAN F , PAGLIOSA P , et al . Similarity preserving snippet-based visualization of Web search results [J ] . IEEE Transactions on Visualization and Computer Graphics , 2014 , 20 ( 3 ): 457 - 470 .
HOFMANN T , . Probabilistic latent semantic indexing [C ] // ACM SIGIR Forum . ACM , 2017 : 211 - 218 .
SADOWSKI C , LEVIN G . Simhash:hash-based similarity detection [R ] . Google , 2007 .
KOŁCZ A , CHOWDHURY A . Lexicon randomization for near-duplicate detection with I-match [J ] . The Journal of Supercomputing , 2008 , 45 ( 3 ): 255 - 276 .
SHIVAKUMAR N , GARCIA-MOLINA H , . Finding near-replicas of documents on the Web [C ] // International Workshop on the World Wide Web and Databases . Springer , 1998 : 204 - 212 .
KAPOOR A , ARORA V . Application of bloom filter for duplicate URL detection in a web crawler [C ] // IEEE International Conference on Collaboration and Internet Computing . IEEE , 2016 : 246 - 255 .
JIANG J , CHEN J , CHOO K K R , et al . A deep learning based online malicious URL and DNS detection scheme [C ] // International Conference on Security and Privacy in Communication Systems . Springer , 2017 : 438 - 448 .
LIU W , DENG X , HUANG G , et al . An antiphishing strategy based on visual similarity assessment [J ] . IEEE Internet Computing , 2006 , 10 ( 2 ): 58 - 65 .
MAO J , TIAN W , LI P , et al . Phishing-alarm:robust and efficient phishing detection via page component similarity [J ] . IEEE Access , 2017 ( 5 ): 17020 - 17030 .
ZHANG H , LIU G , CHOW T W S , et al . Textual and visual content-based anti-phishing:a Bayesian approach [J ] . IEEE Transactions on Neural Networks , 2011 , 22 ( 10 ): 1532 - 1546 .
CHEN Z , ZHANG P , ZHENG C , et al . CookieMiner:towards real-time reconstruction of web-downloading chains from network traces [C ] // IEEE International Conference on Communications . IEEE , 2016 : 1 - 6 .
BOBBARJUNG D R , JAGANNATHAN S , DUBNICKI C . Improving duplicate elimination in storage systems [J ] . ACM Transactions on Storage , 2006 , 2 ( 4 ): 424 - 448 .
MIKOLAJCZYK K , SCHMID C . A performance evaluation of local descriptors [J ] . IEEE transactions on pattern analysis and machine intelligence , 2005 , 27 ( 10 ): 1615 - 1630 .
0
浏览量
1198
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构