ZHENG De-quan, ZHANG Di, ZHAO Tie-jun, et al. Study on the classification and identification of Blog pages[J]. 2007, (12): 156-160.DOI:
Blog网页分类与识别技术研究
摘要
为了找到一种自动将Blog网页区别于其他Web页面的方法
以便针对Blog语料进行内容抽取、对Blog社区进行规律性研究和发现等
针对Blog网页的特点与规律
提出一种根据网页结构和关键字计算相似度的方法识别Blog网页
初步的实验结果表明
达到了较高的识别正确率。
Abstract
In order to find an automatic way to recognize the Blog pages from other Web pages for the content extraction of the Blog pages and other researches.According to the characteristic of Blog pages
some basic concepts and ideas in the area of Blog was described
and a novel method on the identification of Blog pages was proposed based on the struc-ture of the Blog pages and keywords.The experimental results showe that a high result can be achieved in precision.