Ping ZHANG, Hui-min HE, Chun-yan ZHANG, et al. FilterFA: a multiple string matching algorithm based on specification of character set[J]. Journal on Communications, 2016, 37(12): 103-114.
DOI:
Ping ZHANG, Hui-min HE, Chun-yan ZHANG, et al. FilterFA: a multiple string matching algorithm based on specification of character set[J]. Journal on Communications, 2016, 37(12): 103-114. DOI: 10.11959/j.issn.1000-436x.2016277.
FilterFA: a multiple string matching algorithm based on specification of character set
Multiple string matching is one of the core techniques of intrusion detection system
where Aho-Corasick al-gorithm is widely used. To solve the problem that huge storage overhead of AC would influence performance deeply
an improved algorithm ——FilterFA
based on specification of character set was proposed. This algorithm compressed large character by the character set mapping function
and constructed a new automata based on the mapped character set
then space complexity decreased to O(|P||Σ′|). Experiments on synthetic datasets and real-world datasets (such as ClamAV) show that the storage overhead of FilterFA is only about 3% of that of AC
BOYER R S , MOORE J S . A fast string searching algorithm [J ] . Communications of the ACM , 1977 , 20 ( 10 ): 762 - 772 .
KHARBUTLI M , ALDWAIRI M , MUGHRABI A . Function and data parallelization of Wu-Manber pattern matching for intrusion detection systems [J ] . Network Protocols and Algorithms , 2012 , 4 ( 3 ): 46 - 61 .
AHO A V , CORASICK M J . Efficient string matching: an aid to biblio-graphic search [J ] . Communications of the ACM , 1975 , 18 ( 6 ): 333 - 340 .
NAVARRO G , RAFFINOT M . Flexible pattern matching in strings:practical on-line search algorithms for texts and biological sequences [M ] . Oxford City : Cambridge University Press , 2002 .
BAEZA-YATES R A , GONNET G H . A new approach to text searching [C ] // 12th International Conference on Research and Devel-opment in Information Retrieval . 1989 : 168 - 175 .
DENCKER P , DORRE K , HEUFT J . Optimization of parser tables for portable compilers [J ] . ACM Transactions on Programming Languages and Systems , 1984 , 6 ( 4 ): 546 - 572 .
NORTON M . Optimizing pattern matching for intrusion detec-tion [EB/OL ] . http://www.idsresearch.org http://www.idsresearch.org , 2004 .
TUCK N , SHERWOOD T , CALDER B , et al . Deterministic mem-ory-efficient string matching algorithms for intrusion detection [C ] // IEEE INFOCOM . 2004 .
AHO V , SETHI R , ULLMAN J D . Compilers: principles, techniques, and tools [M ] . New Jersey : Addison-Wesley . 1985 .
AOE J , MORIMOTO K , SATO T . An efficient implementation of trie structures [J ] . Software-Practice & Experience , 1992 , 22 ( 9 ): 695 - 721 .
ZIEGLER S . Smaller faster table driven parser, unpublished manu-script [Z ] . Madison Academic Computing Center , 1977 .
TARJAN R E , YAO A C . Storing a sparse table [J ] . Communications of the ACM , 1979 , 22 : 606 - 611 .
FREDMAN M , KOML´OS J , SZEMER´EDI E . Storing a sparse table with O(1) worst case access time [J ] . Journal of the ACM , 1984 , 31 ( 3 ): 538 - 544 .
YANG Y F , LIU Y B , LIU P , et al . Automation compact representa-tion technology in string matching algorithm [J ] . Computer Engineer-ing , 2009 , 35 ( 21 ): 39 - 41 .
NIEVES R , LADRA B S . K2-trees for compact web graph representa-tion [C ] // String Processing and Information Retrieval Lecture Notes in Computer Science . 2009 , 5721 : 18 - 30 .
NIEVES R , LADRA B S . Compact representation of web graphs with extended functionality [J ] . Information Systems , 2014 , 39 ( 1 ): 152 - 174 .
ZHANG P , LIU Y B , YU J , et al . HashTrie: a space-efficient multiple string matching algorithm [J ] . Journal on Communication , 2015 , 36 ( 10 ): 172 - 180 .
Available online [EB/OL ] . http://www.ll.mit.edu/IST/ideval/ http://www.ll.mit.edu/IST/ideval/ .
Available online [EB/OL ] . http://www.clamav.org/ http://www.clamav.org/ .
Available online [EB/OL ] . http://urlblacklist.com/ http://urlblacklist.com/ .