1. 信息内容安全技术国家工程实验室,北京100093
2. 中国科学院 信息工程研究所,北京 100093
3. 中国科学院大学,北京 100049
[ "唐球(1985-),男,湖南怀化人,中国科学院信息工程研究所博士生,主要研究方向为正则表达式匹配、网络安全等。" ]
[ "姜磊(1984-),男,山东烟台人,博士,中国科学院信息工程研究所助理研究员,主要研究方向为正则表达式匹配、网络安全等。" ]
[ "戴琼(1975-),女,土家族,湖南慈利人,中国科学院信息工程研究所副研究员,主要研究方向为网络信息流识别与处理、网络测量与行为分析等。" ]
网络首发:2015-11,
纸质出版:2015-11-25
移动端阅览
唐球, 姜磊, 戴琼. 基于差分压缩的大规模日志压缩系统[J]. 通信学报, 2015,36(Z1):197-202.
Qiu TANG, Lei JIANG, Qiong DAI. Large-scale log compressing system based on differential compression[J]. Journal on Communications, 2015, 36(Z1): 197-202.
唐球, 姜磊, 戴琼. 基于差分压缩的大规模日志压缩系统[J]. 通信学报, 2015,36(Z1):197-202. DOI: 10.11959/j.issn.1000-436x.2015300.
Qiu TANG, Lei JIANG, Qiong DAI. Large-scale log compressing system based on differential compression[J]. Journal on Communications, 2015, 36(Z1): 197-202. DOI: 10.11959/j.issn.1000-436x.2015300.
大型信息系统的日志数据规模呈现快速增长趋势,导致线速压缩与存储大规模日志数据成为当今数据管理的一大挑战。对大量的网络系统日志进行了研究,发现日志数据存在冗余的结构模式,在内容上存在时间局部相似性。提出了基于模板的细粒度日志差分压缩架构,针对具体日志数据,可配置与其相适应的细粒度差分策略。实验结果表明,与gzip工具相比,所提日志压缩系统在压缩速度上提高了2~10倍,压缩率比gzip更低,可达到10%。
The scale of log data produced by the large scale information system is growing rapidly.It leads to the big challenge of line-speed compressing and saving the large scale log data.By analysis on massive network log data
it is found that the log data has redundant pattern in terms of log structure and time local similarity in terms of log content.A differential log compression architecture based on template is proposed.Fine-grained differential compressive strategies in the architecture can be configured for a special log data.Experimental results show that
compared with gizp
the proposed log compressing architecture improves 2~10 times’ compressive speed and gain a better compressing ratio approaching to 10%.
YEN T F , et al . Beehive:large-scale log analysis for detecting suspicious activity in enterprise networks [A ] . Proceedings of the 29th Annual Computer Security Applications Conference [C ] . 2013 . 199 - 208 .
BREIER J , BRANIŠOVÁ J . Anomaly detection from log files using data mining techniques [A ] . Information Science and Applications [C ] . 2015 . 449 - 457 .
DUMAIS S , et al . Understanding user behavior through log data and analysis [A ] . Ways of Knowing in HCI [C ] . 2014 . 349 - 372 .
SRIVASTAVA M , GARG , MISHRA P K . Analysis of data extraction and data cleaning in Web usage mining [A ] . Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering Technology (ICARCSET 2015) [C ] . 2015 . 1 - 6 .
SKIBIŃSKI P , SWACHA J . Fast and efficient log file compression [A ] . Proceedings of CEUR Workshop of 11th East-European Conference on Advances in Databases and Information Systems(ADBIS 2007) [C ] . 2007 .
GRABOWSKI S , DEOROWICZ S . Web log compression [J ] . Automatyka/Akademia Górniczo-Hutniczaim Stanisława Staszicaw Krakowie , 2007 ,( 11 ): 417 - 424 .
DEOROWICZ S , GRABOWSKI S . Efficient preprocessing for Web log compression [J ] . International Journal of Computing , 2008 , 7 ( 1 ): 35 - 42 .
DEOROWICZ S , GRABOWSKI S . Sub-atomic field processing for improved Web log compression [A ] . Proceedings of IEEE International Conference on Modern Problems of Radio Engineering,Telecommunications and Computer Science [C ] . 2008 . 551 - 556 .
HÄTÖNEN K . et al . Comprehensive log compression with frequent patterns [A ] . Data Warehousing and Knowledge Discovery [C ] . 2003 . 360 - 370 .
王艳峰 , 王正 , 阎保平 . 一种高效的 DNS 日志压缩算法 [J ] . 计算机工程 , 2010 , 36 ( 15 ): 32 - 35 .
WANG Y F , WANG Z , YAN B P . High efficient DNS log compression algorithm [J ] . Copular Engineering , 2010 , 36 ( 15 ): 32 - 35 .
CHRISTENSEN R . Improving compression of massive log data [EB/OL ] . http://www.erg.utal.edu http://www.erg.utal.edu , 2013 .
JANG J H , et al . Accelerating forex trading system through transaction log compression [A ] . SoC Design Conference (ISOCC),2014 Interna-tional [C ] . IEEE , 2014 . 24 - 75 .
LONVICK C . RFC 3164:The BSD Syslog Protocol [S ] . Network Working Group .
LEB128 [EB/OL ] . http://en.wikipedia.org/wiki/LEB128 http://en.wikipedia.org/wiki/LEB128 , 2015 .
0
浏览量
2007
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621