浏览全部资源
扫码关注微信
1. 中国科学院 信息工程研究所,北京 100093
2. 中国科学院大学,北京 100049
3. 信息内容安全技术国家工程实验室,北京 100093
[ "张萍(1987-),女,河南唐河人,中国科学院博士生,主要研究方向为网络与信息安全、内容过滤等。" ]
[ "刘燕兵(1981-),男,湖北麻城人,博士,中国科学院副研究员,主要研究方向为文本算法设计、图数据管理与挖掘、网络流识别与处理等。" ]
[ "于静(1989-),女,满族,河北省承德人,中国科学院研究实习员,主要研究方向为图数据模式匹配、计算机视觉等。" ]
[ "谭建龙(1974-),男,湖南长沙人,博士,中国科学院研究员、博士生导师,主要研究方向为网络数据流管理、算法设计、海量正则表达式匹配、图像匹配算法等。" ]
网络出版日期:2015-10,
纸质出版日期:2015-10-25
移动端阅览
张萍, 刘燕兵, 于静, 等. HashTrie:一种空间高效的多模式串匹配算法[J]. 通信学报, 2015,36(10):172-180.
Ping ZHANG, Yan-bing LIU, Jing YU, et al. HashTrie:a space-efficient multiple string matching algorithm[J]. Journal on communications, 2015, 36(10): 172-180.
张萍, 刘燕兵, 于静, 等. HashTrie:一种空间高效的多模式串匹配算法[J]. 通信学报, 2015,36(10):172-180. DOI: 10.11959/j.issn.1000-436x.2015215.
Ping ZHANG, Yan-bing LIU, Jing YU, et al. HashTrie:a space-efficient multiple string matching algorithm[J]. Journal on communications, 2015, 36(10): 172-180. DOI: 10.11959/j.issn.1000-436x.2015215.
经典的多模式串匹配算法AC的内存开销巨大,已经无法满足当前高速网络环境下大规模特征串实时匹配的应用需求。针对这一问题,提出一种空间高效的多模式串匹配算法—HashTrie。该算法运用递归散列函数,将模式串集合的信息存储在位向量中,以取代状态转移表来减少空间消耗,并利用Rank操作进行快速匹配校验。理论分析表明,HashTrie算法的空间复杂度为O(|P|),与模式串集合的规模|P|线性相关,与字符集大小σ无关,优于经典多模式串匹配算法AC的空间复杂度O(|P|σlog|P|)。在随机数据集和真实数据集(Snort、ClamAV和URL)上的测试结果表明,HashTrie算法比AC算法节约高达99.6%的存储空间,匹配速度约为AC算法的一半左右。HashTrie算法适合于模式串集合规模较大、模式串长度较短的多模式串匹配问题,是一种空间高效的多模式串匹配算法。
The famous multiple string matching algorithm AC consumed huge memory when the string signatures were massive
thus unable to process high speed network traffic efficiently.To solve this problem
a space-efficient multiple string matching algorithm-HashTrie was proposed.This algorithm adopted recursive hash function to store the patterns in bit-vectors in place of the state transition table in order to reduce space consumption.Further more it made use of the rank operation for fast verification.Theoretic analysis shows that the space complexity of HashTrie is O(|P|)
which is linear with the size of pattern set |P|and is independent of the alphabetsize σ.The space complexity is superior to the complexity O(|P|σlog|P|)of AC.Experiments on synthetic datasets and real-world datasets(such as Snort
ClamAV and URL)show that HashTrie saves up to 99.6% storage cost compared with AC
and in the meanwhile it runs at a matching speed that is about half of AC.HashTrie is a space-efficient multiple string matching algorithm that is appropriate to search large scale pattern strings with short lengths.
NAVARRO G , RAFFINOT M . Flexible Pattern Matching in Strings:Practical On-line Search Algorithms for Texts and Biological Sequences [M ] . Cambridge University Press , 2002 .
AHO A V , CORASICK M J . Efficient string matching:an aid to bibliographic search [J ] . Communications of the ACM , 1975 , 18 ( 6 ): 333 - 340 .
BAEZA-YATES R , GONNET G H . A new approach to text searching [J ] . Communications of the ACM , 1992 , 35 ( 10 ): 74 - 82 .
BOYER R S , MOORE J S . A fast string searching algorithm [J ] . Communications of the ACM , 1977 , 20 ( 10 ): 762 - 772 .
COMMENTZ-WALTER B . A String Matching Algorithm Fast on the Average [M ] . Springer Berlin Heidelberg , 1979 .
HORSPOOL R N . Practical fast searching in strings [J ] . Software:Practice and Experience , 1980 , 10 ( 6 ): 501 - 506 .
WU S,MANBERU . A fast algorithm for multi-pattern searching [R ] . Technical Report TR-94-17,University of Arizona , 1994 .
RAFFINOT M . On the multi backward dawg matching algorithm(MultiBDM) [A ] . Proc of 4th South American Workshop on String Processing [C ] . 1997 . 149 - 165 .
ALLAUZEN C , CROCHEMORE M , RAFFINOT M . Factor oracle:a new structure for pattern matching [A ] . SOFSEM’99:Theory and Practice of Informatics [C ] . Springer Berlin Heidelberg , 1999 . 295 - 310 .
NAVARRO G , RAFFINOT M . A bit-parallel approach to suffix automata:fast extended string matching [A ] . Combinatorial Pattern Matching [C ] . Springer Berlin Heidelberg , 1998 . 14 - 33 .
LIN C H , LIU C H , CHANG S C , et al . Memory-efficient pattern matching architectures using perfect hashing on graphic processing units [A ] . INFOCOM,2012 Proceedings IEEE [C ] . IEEE , 2012 . 1978 - 1986 .
AOE J I . An efficient implementation of static string pattern matching machines [J ] . IEEE Transactions on Software Engineering , 1989 , 15 ( 8 ): 1010 - 1016 .
ERDOGAN O , CAO P . Hash-AV:fast virus signature scanning by cache-resident filters [J ] . International Journal of Security and Networks , 2007 , 2 ( 1 ): 50 - 59 .
TAN L , SHERWOOD T . A high throughput string matching architecture for intrusion detection and prevention [J ] . ACMSIGARCH Computer Architecture News,IEEE Computer Society , 2005 , 33 ( 2 ): 112 - 122 .
VAN L J . High-performance pattern matching for intrusion detection [A ] . INFOCOM,2006 Proceedings IEEE [C ] . IEEE , 2006 . 1 - 13 .
SONG T , ZHANG W , WANG D , et al . A memory efficient multiple pattern matching architecture for network security [A ] . INFOCOM 2008,The 27th Conference on Computer Communications [C ] . IEEE , 2008 .
Available online [EB/OL ] . http://en.wikipedia.org/wiki/Linear_congruential_generator http://en.wikipedia.org/wiki/Linear_congruential_generator .
JACOBSON G . Space-efficient static trees and graphs [A ] . Foundations of Computer Science [C ] . 1989 . 549 - 554 .
何慧敏 , 刘燕兵 , 谭建龙 , 等 . 一种基于子串识别的多模式串匹配算法 [J ] . 计算机应用与软件 , 2012 , 28 ( 11 ): 10 - 14 .
HE H M , LIU Y B , TAN J L , et al . A substring recognition based multiple patterns string matching algorithm [J ] . Computer Applications and Software , 2012 , 28 ( 11 ): 10 - 14 .
LIU Y B , LIU Q Y , LIU P , et al . A factor-searching-based multiple string matching algorithm for intrusion detection [A ] . Communications(ICC),2014 IEEE International Conference [C ] . IEEE , 2014 . 653 - 658 .
Available online [EB/OL ] . http://www.ll.mit.edu/IST/ideval/ http://www.ll.mit.edu/IST/ideval/ .
Available online [EB/OL ] . http://www.snort.org/ http://www.snort.org/ .
Available online [EB/OL ] . http://www.clamav.org/ http://www.clamav.org/ .
Available online [EB/OL ] . http://urlblacklist.com/ http://urlblacklist.com/ .
0
浏览量
1355
下载量
4
CSCD
关联资源
相关文章
相关作者
相关机构