Study on the classification and identification of Blog pages
|更新时间:2024-10-14
|
Study on the classification and identification of Blog pages
Issue 12, Pages: 156-160(2007)
作者机构:
1. 哈尔滨工业大学语言语音教育部-微软重点实验室
2. 哈尔滨工业大学语言语音教育部-微软重点实验室,黑龙江,哈尔滨,150001
作者简介:
基金信息:
DOI:
CLC:TP393.092
Published:2007
稿件说明:
移动端阅览
ZHENG De-quan, ZHANG Di, ZHAO Tie-jun, et al. Study on the classification and identification of Blog pages[J]. 2007, (12): 156-160.
DOI:
ZHENG De-quan, ZHANG Di, ZHAO Tie-jun, et al. Study on the classification and identification of Blog pages[J]. 2007, (12): 156-160.DOI:
Study on the classification and identification of Blog pages
摘要
为了找到一种自动将Blog网页区别于其他Web页面的方法
以便针对Blog语料进行内容抽取、对Blog社区进行规律性研究和发现等
针对Blog网页的特点与规律
提出一种根据网页结构和关键字计算相似度的方法识别Blog网页
初步的实验结果表明
达到了较高的识别正确率。
Abstract
In order to find an automatic way to recognize the Blog pages from other Web pages for the content extraction of the Blog pages and other researches.According to the characteristic of Blog pages
some basic concepts and ideas in the area of Blog was described
and a novel method on the identification of Blog pages was proposed based on the struc-ture of the Blog pages and keywords.The experimental results showe that a high result can be achieved in precision.