基于属性值序列图模型的deep Web新数据发现策略

崔志明; 赵朋朋; 鲜学丰; 方立刚; 杨元峰; 顾才东

doi:10.11959/j.issn.1000-436x.2016049

您当前的位置：

首页 >

文章列表页 >

基于属性值序列图模型的deep Web新数据发现策略

学术论文 | 更新时间：2024-06-05

- 基于属性值序列图模型的deep Web新数据发现策略
- Deep Web new data discovery strategy based on the graph model of data attribute value lists
- 通信学报 2016年37卷第3期页码：20-32
- 作者机构：
  
  1. 江苏省现代企业信息化应用支撑软件工程技术研发中心，江苏苏州215104
  2. 苏州大学智能信息处理及应用研究所，江苏苏州215006
  3. 苏州市职业大学计算机工程学院，江苏苏州215104
- 作者简介：
  
  [ "崔志明（1961-），男，上海人，苏州大学教授、博士生导师，主要研究方向为智能信息处理和计算机网络。" ]
  [ "赵朋朋（1980-），男，江苏南通人，博士，苏州大学副教授，主要研究方向为deep Web和Web数据挖掘。" ]
  [ "鲜学丰（1980-），男，四川南充人，博士，苏州市职业大学副教授，主要研究方向为Web数据管理、数据挖掘和智能信息处理。" ]
  [ "方立刚（1980-），男，安徽黄山人，博士，苏州市职业大学副教授，主要研究方向为计算机网络和Web GIS。" ]
  [ "杨元峰（1973-），男，江苏盐城人，苏州市职业大学副教授，主要研究方向为智能信息处理。" ]
  [ "顾才东（1963-），男，宁夏吴忠人，苏州市职业大学教授，主要研究方向为智能信息处理和物联网。" ]
- 基金信息：
  
  国家自然科学基金资助项目(61440053);国家自然科学基金资助项目(61472268);国家自然科学基金资助项目(41201338);江苏省自然科学基金资助项目(BK2012164);苏州市科技计划基金资助项目(SYG201342);苏州市科技计划基金资助项目(SYG201343);苏州市科技计划基金资助项目(SS201344)
- DOI：10.11959/j.issn.1000-436x.2016049
  中图分类号： TP392
- 网络首发：2016-03，
  
  纸质出版：2016-03-25
- 稿件说明：
移动端阅览
崔志明, 赵朋朋, 鲜学丰, 等. 基于属性值序列图模型的deep Web新数据发现策略[J]. 通信学报, 2016,37(3):20-32.

Zhi-ming CUI, Peng-peng ZHAO, Xue-feng XIAN, et al. Deep Web new data discovery strategy based on the graph model of data attribute value lists[J]. Journal on Communications, 2016, 37(3): 20-32.
崔志明, 赵朋朋, 鲜学丰, 等. 基于属性值序列图模型的deep Web新数据发现策略[J]. 通信学报, 2016,37(3):20-32. DOI： 10.11959/j.issn.1000-436x.2016049.

Zhi-ming CUI, Peng-peng ZHAO, Xue-feng XIAN, et al. Deep Web new data discovery strategy based on the graph model of data attribute value lists[J]. Journal on Communications, 2016, 37(3): 20-32. DOI： 10.11959/j.issn.1000-436x.2016049.

摘要

针对数据源新产生数据记录的增量爬取问题，提出了一种deep Web 新数据发现策略，该策略采用一种新的属性值序列图模型表示deep Web 数据源，将新数据发现问题转化为属性值序列图的遍历问题，该模型仅与数据相关，与现有查询关联图模型相比，具有更强的适应性和确定性，可适用于仅仅包含简单查询接口的deep Web数据源。在此模型的基础上，发现增长节点并预测其新数据发现能力；利用互信息计算节点之间的依赖关系，查询选择时尽可能地降低查询依赖带来的负面影响。该策略提高了新数据爬取的效率，实验结果表明，在相同资源约束前提下，该策略能使本地数据和远程数据保持最大化同步。

Abstract

A novel deep Web data discovery strategy was proposed for new generated data record in resources.In the ap-proach

a new graph model of deep Web data attribute value lists was used to indicate the deep Web data source

an new data crawling task was transformed into a graph traversal process.This model was only related to the data

compared with the ex-isting query-related graph model had better adaptability and certainty

applicable to contain only a simple query interface of deep Web data sources.Based on this model

which could discovery incremental nodes and predict new data mutual infor-mation was used to compute the dependencies between nodes.When the query selects

as much as possible to reduce the negative impact brought by the query-dependent.This strategy improves the data crawling efficiency.Experimental results show that this strategy could maximize the synchronization between local and remote data under the same restriction.

关键词

Keywords

references

MADHAVAN J , COHEN S , DONG X L , et al . Web-scale data inte-gration:you can afford to pay as you go [C ] // The 3rd International Conference Innovative Data Systems Research . Asilomar,CA , c 2007 : 342 - 350 .

MADHAVAN J , KO D , KOT L , et al . Google's deep-Web crawl [C ] // The 34th International Conference on Very Large Data Bases . Auckland,New Zealand,Springer , c 2008 : 1241 - 1252 .

PAVAI G , GEETHA T V . A unified architecture for surfacing the con-tents of deep Web databases [C ] // International Conference on Advances in Communication . Network,and Computing,Chennai,India , c 2013 .

ANDREA C , DAVIDE M , RICCARDO T . Keyword search in the deep Web [C ] // AMW2015 Alberto Mendelzon International Workshop on Foundations of Data Management . Lima Peru , c 2015 : 205 - 208 .

EDWARDS J , MCCURLEY K , TOMLIN J . An adaptive model for optimizing performance of an incremental Web crawler [C ] // The 10th Conference on World Wide Web . Hong Kong,China , c 2001 : 106 - 113 .

SINGHAL N , DIXIT A , SHARMA A K . Design of a priority based frequency regulated incremental crawler [J ] . International Journal of Computer Applications , 2010 , 1 ( 1 ): 42 - 47 .

JAGANATHAN P , KARTHIKEYAN T . Highly efficient architecture for scalable focused crawling using incremental parallel Web craw-ler [J ] . Journal of Computer Science , 2015 , 11 ( 1 ): 120 - 126 .

LIU W , XIAO J G , YANG J W . Incremental structured Web database crawling via history versions [C ] // The 11th International Conference on Web Information Systems Engineering . c 2010 : 524 - 533 .

LIU W , XIAO J G , YANG J W . A sample-guided approach to incre-mental structured Web database crawling [C ] // International Conference on Information and Automation , Harbin , c 2010 : 890 - 895 .

HUANG Q Y , LI Q Z , LI H , et al . An approach to incremental deep Web crawling based on incremental harvest model [J ] . Procedia Engi-neering , 2012 , 29 : 1081 - 1087 .

ZHANG Z X , DONG G Q , PENG Z H , et al . A framework for incre-mental deep Web crawler based on URL classification [J ] . Lecture Notes in Computer Science , 2011 , 6988 : 302 - 310 .

张志潇 . 面向领域的Deep Web的增量爬取 [D ] . 济南：山东大学， 2012 .

ZHANG Z X . Domain-specific deep Web incremental crawler [D ] . Ji-Nan : Shandong University , 2012 .

YOGESH K , MANOJ K R , JITENDRA D . Novel approach for data source integration system update strategy in hidden Web [J ] . Interna-tional Journal of Engineering Universe for Scientific Research and Management , 2015 , 2 ( 7 ): 1 - 5 .

徐国强 . 统计预测和决策 [M ] . 上海；上海财经大学出版社 . 2008 .

XU G Q . Statistical forecasting and decision-making [M ] . Shanghai : Shanghai University of Finance and Economics press . 2003 .

WU P , WEN J R , LIU H , et al . Query selection techniques for efficient crawling of structured Web sources [C ] // The 22th International Confe-rence on Data Engineering , Atlanta,GA,USA , c 2006 : 47 - 56 .

浏览量

903

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于循环策略和动态知识的deep Web数据获取方法

一种新型希夫碱锌配合物的合成及光谱学性质

退火温度及退火气氛对ZnO薄膜的结构及发光性能的影响

前驱溶液的pH值对制备Ca₂Zn₄Ti₁₆O₃₈ ∶ Pr³⁺,Na⁺ 发光粉物相、形貌和发光性质的影响

非热离子对非均匀碰撞热尘埃等离子体中三维孤波的影响