浏览全部资源
扫码关注微信
1. 江苏省现代企业信息化应用支撑软件工程技术研发中心,江苏 苏州215104
2. 苏州大学智能信息处理及应用研究所,江苏 苏州215006
3. 苏州市职业大学计算机工程学院,江苏 苏州215104
[ "崔志明(1961-),男,上海人,苏州大学教授、博士生导师,主要研究方向为智能信息处理和计算机网络。" ]
[ "赵朋朋(1980-),男,江苏南通人,博士,苏州大学副教授,主要研究方向为deep Web和Web数据挖掘。" ]
[ "鲜学丰(1980-),男,四川南充人,博士,苏州市职业大学副教授,主要研究方向为Web数据管理、数据挖掘和智能信息处理。" ]
[ "方立刚(1980-),男,安徽黄山人,博士,苏州市职业大学副教授,主要研究方向为计算机网络和Web GIS。" ]
[ "杨元峰(1973-),男,江苏盐城人,苏州市职业大学副教授,主要研究方向为智能信息处理。" ]
[ "顾才东(1963-),男,宁夏吴忠人,苏州市职业大学教授,主要研究方向为智能信息处理和物联网。" ]
网络出版日期:2016-03,
纸质出版日期:2016-03-25
移动端阅览
崔志明, 赵朋朋, 鲜学丰, 等. 基于属性值序列图模型的deep Web新数据发现策略[J]. 通信学报, 2016,37(3):20-32.
Zhi-ming CUI, Peng-peng ZHAO, Xue-feng XIAN, et al. Deep Web new data discovery strategy based on the graph model of data attribute value lists[J]. Journal on communications, 2016, 37(3): 20-32.
崔志明, 赵朋朋, 鲜学丰, 等. 基于属性值序列图模型的deep Web新数据发现策略[J]. 通信学报, 2016,37(3):20-32. DOI: 10.11959/j.issn.1000-436x.2016049.
Zhi-ming CUI, Peng-peng ZHAO, Xue-feng XIAN, et al. Deep Web new data discovery strategy based on the graph model of data attribute value lists[J]. Journal on communications, 2016, 37(3): 20-32. DOI: 10.11959/j.issn.1000-436x.2016049.
针对数据源新产生数据记录的增量爬取问题,提出了一种deep Web 新数据发现策略,该策略采用一种新的属性值序列图模型表示deep Web 数据源,将新数据发现问题转化为属性值序列图的遍历问题,该模型仅与数据相关,与现有查询关联图模型相比,具有更强的适应性和确定性,可适用于仅仅包含简单查询接口的deep Web数据源。在此模型的基础上,发现增长节点并预测其新数据发现能力;利用互信息计算节点之间的依赖关系,查询选择时尽可能地降低查询依赖带来的负面影响。该策略提高了新数据爬取的效率,实验结果表明,在相同资源约束前提下,该策略能使本地数据和远程数据保持最大化同步。
A novel deep Web data discovery strategy was proposed for new generated data record in resources.In the ap-proach
a new graph model of deep Web data attribute value lists was used to indicate the deep Web data source
an new data crawling task was transformed into a graph traversal process.This model was only related to the data
compared with the ex-isting query-related graph model had better adaptability and certainty
applicable to contain only a simple query interface of deep Web data sources.Based on this model
which could discovery incremental nodes and predict new data mutual infor-mation was used to compute the dependencies between nodes.When the query selects
as much as possible to reduce the negative impact brought by the query-dependent.This strategy improves the data crawling efficiency.Experimental results show that this strategy could maximize the synchronization between local and remote data under the same restriction.
MADHAVAN J , COHEN S , DONG X L , et al . Web-scale data inte-gration:you can afford to pay as you go [C ] // The 3rd International Conference Innovative Data Systems Research . Asilomar,CA , c 2007 : 342 - 350 .
MADHAVAN J , KO D , KOT L , et al . Google's deep-Web crawl [C ] // The 34th International Conference on Very Large Data Bases . Auckland,New Zealand,Springer , c 2008 : 1241 - 1252 .
PAVAI G , GEETHA T V . A unified architecture for surfacing the con-tents of deep Web databases [C ] // International Conference on Advances in Communication . Network,and Computing,Chennai,India , c 2013 .
ANDREA C , DAVIDE M , RICCARDO T . Keyword search in the deep Web [C ] // AMW2015 Alberto Mendelzon International Workshop on Foundations of Data Management . Lima Peru , c 2015 : 205 - 208 .
EDWARDS J , MCCURLEY K , TOMLIN J . An adaptive model for optimizing performance of an incremental Web crawler [C ] // The 10th Conference on World Wide Web . Hong Kong,China , c 2001 : 106 - 113 .
SINGHAL N , DIXIT A , SHARMA A K . Design of a priority based frequency regulated incremental crawler [J ] . International Journal of Computer Applications , 2010 , 1 ( 1 ): 42 - 47 .
JAGANATHAN P , KARTHIKEYAN T . Highly efficient architecture for scalable focused crawling using incremental parallel Web craw-ler [J ] . Journal of Computer Science , 2015 , 11 ( 1 ): 120 - 126 .
LIU W , XIAO J G , YANG J W . Incremental structured Web database crawling via history versions [C ] // The 11th International Conference on Web Information Systems Engineering . c 2010 : 524 - 533 .
LIU W , XIAO J G , YANG J W . A sample-guided approach to incre-mental structured Web database crawling [C ] // International Conference on Information and Automation , Harbin , c 2010 : 890 - 895 .
HUANG Q Y , LI Q Z , LI H , et al . An approach to incremental deep Web crawling based on incremental harvest model [J ] . Procedia Engi-neering , 2012 , 29 : 1081 - 1087 .
ZHANG Z X , DONG G Q , PENG Z H , et al . A framework for incre-mental deep Web crawler based on URL classification [J ] . Lecture Notes in Computer Science , 2011 , 6988 : 302 - 310 .
张志潇 . 面向领域的Deep Web的增量爬取 [D ] . 济南 : 山东大学 , 2012 .
ZHANG Z X . Domain-specific deep Web incremental crawler [D ] . Ji-Nan : Shandong University , 2012 .
YOGESH K , MANOJ K R , JITENDRA D . Novel approach for data source integration system update strategy in hidden Web [J ] . Interna-tional Journal of Engineering Universe for Scientific Research and Management , 2015 , 2 ( 7 ): 1 - 5 .
徐国强 . 统计预测和决策 [M ] . 上海 ; 上海财经大学出版社 . 2008 .
XU G Q . Statistical forecasting and decision-making [M ] . Shanghai : Shanghai University of Finance and Economics press . 2003 .
WU P , WEN J R , LIU H , et al . Query selection techniques for efficient crawling of structured Web sources [C ] // The 22th International Confe-rence on Data Engineering , Atlanta,GA,USA , c 2006 : 47 - 56 .
0
浏览量
680
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构