HitIct:Chinese corpus for the evaluation of lossless compression algorithms
|更新时间:2024-10-14
|
HitIct:Chinese corpus for the evaluation of lossless compression algorithms
Vol. 30, Issue 3, Pages: 42-47(2009)
作者机构:
1. 哈尔滨工业大学计算机网络与信息安全技术研究中心
2. 中国科学院计算技术研究所信息智能与信息安全研究中心
作者简介:
基金信息:
DOI:
CLC:TP391.41
Published:2009
稿件说明:
移动端阅览
CHANG Wei-ling1, YUN Xiao-chun2, FANG Bin-xing1, et al. HitIct:Chinese corpus for the evaluation of lossless compression algorithms[J]. 2009, 30(3): 42-47.
DOI:
CHANG Wei-ling1, YUN Xiao-chun2, FANG Bin-xing1, et al. HitIct:Chinese corpus for the evaluation of lossless compression algorithms[J]. 2009, 30(3): 42-47.DOI:
HitIct:Chinese corpus for the evaluation of lossless compression algorithms
a Chinese corpus for the evaluation of lossless compression algorithms based on ANSI code
was proposed.In accordance with the principle of application representativeness
Complementary principle and openness principle
a large number of candidate files were obtained from the Internet
and then average compression ratio
average correlation coefficient
compression ratio correlation coefficient and standard deviation were used to select the files that give the most accurate indication of the overall performance of compression algorithms.Experimental results show that this collection has a good representativeness and stability
and can be used as the supplementary test set of the main benchmark for comparing compression methods.