浏览全部资源
扫码关注微信
1. 中国科学院信息工程研究所,北京 100093
2. 中国科学院大学,北京 100049
3. 信息内容安全技术国家工程实验室,北京 100093
4. 中国移动(深圳)有限公司,深圳 518031
[ "张萍(1987-),女,河南唐河人,中国科学院信息工程研究所博士生,主要研究方向为网络与信息安全、内容过滤等。" ]
[ "何慧敏(1987-),女,山东菏泽人,中国移动(深圳)有限公司工程师,主要研究方向为串匹配算法。" ]
[ "张春燕(1989-),女,河北衡水人,中国科学院信息工程研究所实习研究员,主要研究方向为信息内容安全和高性能计算。" ]
[ "曹聪(1987-),男,山东枣庄人,博士,中国科学院信息工程研究所助理研究员,主要研究方向为数据挖掘、文本抽取、常识知识获取。" ]
[ "刘燕兵(1981-),男,湖北麻城人,博士,中国科学院信息工程研究所副研究员,主要研究方向为文本算法设计、图数据管理与挖掘、网络流识别与处理等。" ]
[ "谭建龙(1974-),男,湖南长沙人,博士,中国科学院信息工程研究所研究员、博士生导师,主要研究方向为网络数据流管理、算法设计、海量正则表达式匹配、图像匹配算法等。" ]
网络出版日期:2016-12,
纸质出版日期:2016-12-25
移动端阅览
张萍, 何慧敏, 张春燕, 等. FilterFA:一种基于字符集规约的模式串匹配算法[J]. 通信学报, 2016,37(12):103-114.
Ping ZHANG, Hui-min HE, Chun-yan ZHANG, et al. FilterFA: a multiple string matching algorithm based on specification of character set[J]. Journal on communications, 2016, 37(12): 103-114.
张萍, 何慧敏, 张春燕, 等. FilterFA:一种基于字符集规约的模式串匹配算法[J]. 通信学报, 2016,37(12):103-114. DOI: 10.11959/j.issn.1000-436x.2016277.
Ping ZHANG, Hui-min HE, Chun-yan ZHANG, et al. FilterFA: a multiple string matching algorithm based on specification of character set[J]. Journal on communications, 2016, 37(12): 103-114. DOI: 10.11959/j.issn.1000-436x.2016277.
多模式串匹配技术是入侵检测系统的核心技术之一,Aho-Corasick算法广泛应用于其中。针对AC自动机内存开销巨大影响算法性能的问题,提出一种基于字符集规约的改进算法——FilterFA。利用字符集映射函数将原字符集压缩为多个像字符集,针对像字符集构造新的自动机FilterFA,将空间复杂度降至O(|P||Σ′|)。在随机数据集和真实数据集ClamAV上的测试结果表明,当像字符集大小为8,且保证误识别率小于2%时,FilterFA算法消耗的存储空间仅为AC算法的3%左右。
Multiple string matching is one of the core techniques of intrusion detection system
where Aho-Corasick al-gorithm is widely used. To solve the problem that huge storage overhead of AC would influence performance deeply
an improved algorithm ——FilterFA
based on specification of character set was proposed. This algorithm compressed large character by the character set mapping function
and constructed a new automata based on the mapped character set
then space complexity decreased to O(|P||Σ′|). Experiments on synthetic datasets and real-world datasets (such as ClamAV) show that the storage overhead of FilterFA is only about 3% of that of AC
while the size of the character set is 8
and the false recognition rate is less than 2%.
Snort.Org [EB/OL ] . https://www.snort.org/ https://www.snort.org/ .
BOYER R S , MOORE J S . A fast string searching algorithm [J ] . Communications of the ACM , 1977 , 20 ( 10 ): 762 - 772 .
KHARBUTLI M , ALDWAIRI M , MUGHRABI A . Function and data parallelization of Wu-Manber pattern matching for intrusion detection systems [J ] . Network Protocols and Algorithms , 2012 , 4 ( 3 ): 46 - 61 .
AHO A V , CORASICK M J . Efficient string matching: an aid to biblio-graphic search [J ] . Communications of the ACM , 1975 , 18 ( 6 ): 333 - 340 .
NAVARRO G , RAFFINOT M . Flexible pattern matching in strings:practical on-line search algorithms for texts and biological sequences [M ] . Oxford City : Cambridge University Press , 2002 .
BAEZA-YATES R A , GONNET G H . A new approach to text searching [C ] // 12th International Conference on Research and Devel-opment in Information Retrieval . 1989 : 168 - 175 .
DENCKER P , DORRE K , HEUFT J . Optimization of parser tables for portable compilers [J ] . ACM Transactions on Programming Languages and Systems , 1984 , 6 ( 4 ): 546 - 572 .
NORTON M . Optimizing pattern matching for intrusion detec-tion [EB/OL ] . http://www.idsresearch.org http://www.idsresearch.org , 2004 .
TUCK N , SHERWOOD T , CALDER B , et al . Deterministic mem-ory-efficient string matching algorithms for intrusion detection [C ] // IEEE INFOCOM . 2004 .
AHO V , SETHI R , ULLMAN J D . Compilers: principles, techniques, and tools [M ] . New Jersey : Addison-Wesley . 1985 .
AOE J , MORIMOTO K , SATO T . An efficient implementation of trie structures [J ] . Software-Practice & Experience , 1992 , 22 ( 9 ): 695 - 721 .
ZIEGLER S . Smaller faster table driven parser, unpublished manu-script [Z ] . Madison Academic Computing Center , 1977 .
TARJAN R E , YAO A C . Storing a sparse table [J ] . Communications of the ACM , 1979 , 22 : 606 - 611 .
FREDMAN M , KOML´OS J , SZEMER´EDI E . Storing a sparse table with O(1) worst case access time [J ] . Journal of the ACM , 1984 , 31 ( 3 ): 538 - 544 .
杨毅夫 , 刘燕兵 , 刘萍 , 等 . 串匹配算法中的自动机紧缩存储技术 [J ] . 计算机工程 , 2009 , 35 ( 21 ): 39 - 41 .
YANG Y F , LIU Y B , LIU P , et al . Automation compact representa-tion technology in string matching algorithm [J ] . Computer Engineer-ing , 2009 , 35 ( 21 ): 39 - 41 .
NIEVES R , LADRA B S . K2-trees for compact web graph representa-tion [C ] // String Processing and Information Retrieval Lecture Notes in Computer Science . 2009 , 5721 : 18 - 30 .
NIEVES R , LADRA B S . Compact representation of web graphs with extended functionality [J ] . Information Systems , 2014 , 39 ( 1 ): 152 - 174 .
张萍 , 刘燕兵 , 于静 , 等 . HashTrie:一种空间高效的多模式串匹配算法 [J ] . 通信学报 , 2015 , 36 ( 10 ): 172 - 180 .
ZHANG P , LIU Y B , YU J , et al . HashTrie: a space-efficient multiple string matching algorithm [J ] . Journal on Communication , 2015 , 36 ( 10 ): 172 - 180 .
Available online [EB/OL ] . http://www.ll.mit.edu/IST/ideval/ http://www.ll.mit.edu/IST/ideval/ .
Available online [EB/OL ] . http://www.clamav.org/ http://www.clamav.org/ .
Available online [EB/OL ] . http://urlblacklist.com/ http://urlblacklist.com/ .
0
浏览量
1050
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构