浏览全部资源
扫码关注微信
1. 中国人民大学 教育部数据工程与知识工程教育部重点实验室,北京 100872
2. 中国人民大学 信息学院,北京 100872
[ "杜小勇(1963-),男,浙江衢州人,博士,中国人民大学教授、博士生导师,主要研究方向为智能信息检索、高性能数据库、知识工程。" ]
[ "陈峻(1991-),男,浙江温州人,中国人民大学博士生,主要研究方向为探索式搜索。" ]
[ "陈跃国(1976-),男,辽宁盖州人,博士,中国人民大学副教授、博士生导师,主要研究方向为大数据分析系统和语义搜索。" ]
网络出版日期:2015-12,
纸质出版日期:2015-12-25
移动端阅览
杜小勇, 陈峻, 陈跃国. 大数据探索式搜索研究[J]. 通信学报, 2015,36(12):77-88.
Xiao-yong DU, Jun CHEN, Yue-guo CHEN. Exploratory search on big data[J]. Journal on communications, 2015, 36(12): 77-88.
杜小勇, 陈峻, 陈跃国. 大数据探索式搜索研究[J]. 通信学报, 2015,36(12):77-88. DOI: 10.11959/j.issn.1000-436x.2015316.
Xiao-yong DU, Jun CHEN, Yue-guo CHEN. Exploratory search on big data[J]. Journal on communications, 2015, 36(12): 77-88. DOI: 10.11959/j.issn.1000-436x.2015316.
数据探索(data exploration)是有别于数据服务与数据分析的第3种体现大数据价值的技术手段。数据服务强调从微观层面获取满足用户需求的精准信息;数据分析强调从宏观层面为用户提供数据洞察,进而提供决策支持;而数据探索是一种支持用户在微观层面和宏观层面进行自由切换的、深入浅出的、交互式发掘数据价值的方式。首先,简要介绍大数据价值发掘的传统技术手段和特点,并引入探索式搜索;其次,详细阐述探索式搜索的定义与模型,总结探索式搜索的特点;随后,基于组件化的思想,设计探索式搜索系统框架,并综述每个组件所涉及到的挑战与关键技术;最后简要介绍了笔者在知识库探索式搜索方面的尝试。
Exploratory search is a new approach for discovering the value of big data
compared with data serving and data analysis.Data serving emphasizes to meet users' information need at the micro-level
and data analysis emphasizes to discover insights among data at the macro-level.However
exploratory search is a way to support user to freely swap between micro-level to macro-level and interactively explore the value of data as well.Firstly
approaches for discovering the value of big data were discussed.Secondly
the definition
model and characteristics of exploratory search were illustrated.Thirdly
the architecture of exploratory search systems was designed
and a review of the challenges and techniques of each component of the architecture were given.Finally
preliminary results of exploratory search in RDF knowledge bases were introduced.
MENG X F , CI X . Big data management:concepts,techniques and challenges [J ] . Journal of Computer Research and Development , 2013 , 50 ( 1 ): 146 - 169 .
MANNING C , RAGHAVAN P,SCHÜTZE H . Introduction to Information Retrieval [M ] . Cambridge University Press , 2008 .
JUDD C , MCCLELLAND G , RYAN C . Data Analysis:a Model comparison approach [M ] . Routledge Press , 2009 .
MARCHIONINI G . Exploratory search:from finding to understanding [J ] . Communication of the ACM , 2006 , 49 ( 4 ): 41 - 46 .
HECHT B , CARTON S , QUADERI M , et al . Explanatory semantic relatedness and explicit spatialization for exploratory search [A ] . SIGIR [C ] . 2012 . 415 - 424 .
ROITMAN H , YOGEV S , TSIMERMAN Y , et al . Exploratory search over social-medical data [A ] . CIKM [C ] . 2011 , 1513 - 2516 .
BOZZON A , BRAMBILLA M , CERI S , et al . Exploratory search in multi-domain information spaces with liquid query [A ] . WWW [C ] . 2011 . 189 - 192 .
HAM F , PERER A . Search,show context,expand on demand:supporting large graph exploration with degree-of-interest [J ] . IEEE Transaction on Visualization and Computer Graphics , 2009 , 15 ( 6 ): 953 - 960 .
DUNNE C , RICHE N , LEE B , et al . GraphTrail:analyzing large multivariate,heterogeneous networks while supporting exploration history [A ] . CHI [C ] . 2012 . 1663 - 1672 .
YOGEV S , ROITMAN H , CARMEL D , et al . Towards expressive exploratory search over entity-relationship data [A ] . WWW [C ] . 2012 . 83 - 92 .
MIRIZZI R , RAGONE A , SCIASCIO E . Like breadcrumbs in the forest:a tool for semantic exploratory search [A ] . EDBT/ICDT Workshop on Linked Web Data Management [C ] . 2011 . 32 - 33 .
KOUTRIKA G , LAKSHMANAN L , RIEDEWALD M , et al . Report on the first international workshop on exploratory search in databases and the Web [J ] . SIGMOD Record , 2014 , 43 ( 2 ): 49 - 52 .
IDREOS S , PAPAEMMANOUIL O , CHAUDHURI S . Overview of data exploration techniques [A ] . SIGMOD [C ] . 2015 . 277 - 281 .
WHITE R , KULES B , BEDERSON B . Exploratory search interfaces:categorization,clustering and beyond [J ] . SIGIR Forum , 2005 , 39 ( 2 ): 52 - 56 .
WHITE R , MURESAN G , MARCHIONINI G . Report on ACM SIGIR 2006 workshop on evaluating exploratory search systems [J ] . SIGIR Forum , 2006 , 40 ( 2 ): 52 - 60 .
WHITE R , DRUKER S , MARCHIONINI G , et al . Exploratory search and HCI:designing and evaluating interfaces to support exploratory search interaction [A ] . SIGCHI [C ] . 2007 . 2877 - 2880 .
WHITE R , ROTH R . Exploratory search:beyond the query-response paradigm [M ] . Morgan & Claypool Publishers , 2009 .
AGAPIE E , GOLOVCHINSKY G , QVARFORDT P . Leading people to longer queries [A ] . CHI [C ] . 2013 . 3019 - 3022 .
TRETTER S , GOLOVCHINSKY G , QVARFORDT P . SearchPanel:a browser extension for managing search activity [A ] . EuroHCIR [C ] . 2013 . 51 - 54 .
GOLOVCHINSKY G , DIRIYE A , DUNNIGAN T . The future is in the past:designing for exploratory search [A ] . IIiX [C ] . 2012 . 52 - 61 .
GOLOVCHINSKY G , QVARFORDT P , PICKENS J . Collaborative information seeking [J ] . IEEE Computer Society , 2009 , 42 ( 3 ): 47 - 51 .
MORRIS M , HORVITZ E . SearchTogether:an interface for collaborative web search [A ] . UIST [C ] . 2007 . 3 - 12 .
REN L . Research on Interaction Techniques in Information Visualization [D ] . Beijing:Chinese Academy of Sciences , 2009 .
CARD K , MACKINLAY D , SHNEIDERMAN B . Readings in Information Visualization:Using Vision to Think [M ] . San Francisco : Morgan-Kaufmann PublishersPress , 1999 .
KEIM D . Information visualization and visual data mining [J ] . IEEE Transaction on Visualization and Computer Graphics , 2002 , 8 ( 1 ): 1 - 8 .
REN L,DU Y , MA S , ZHANG XL , et al . Visual analytics towards big data [J ] . Journal of Software , 2014 , 25 ( 9 ): 1909 - 1936 .
STOLTE C , TANG D , HANRAHAN P . Polaris:a system for query,analysis and visualization of multi-dimensional relational databases [J ] . IEEE Transactions on Visualization and Computer Graphics , 2002 , 8 ( 1 )
KEY A , HOWE B , PERRY D , et al . VizDeck:self-organizing dashboards for visual analytics [A ] . SIGMOD [C ] . 2012 . 681 - 684 .
ABOUZIED A , HELLERSTEIN J , SILBERSCHATZ A . Playful query specification with dataplay [J ] . Proceedings of the Very Large Data Bases Endowment , 2012 , 5 ( 12 ): 1938 - 1941 .
QARABAQI B , RIEDEWALD M . User-driven refinement of imprecise queries [A ] . ICDE [C ] . 2014 . 916 - 927 .
TRAN Q , CHAN CY , PARTHASARATHY S . Query by output [A ] . SIGMOD [C ] . 2009 . 535 - 548 .
SHOKOUHI M , SLOAN M , BENNETT PN , et al . Query suggestion and data fusion in contextual disambiguation [A ] . WWW [C ] . 2015 . 971 - 980 .
GAO J , YUAN W , LI X , et al . Smoothing click through data for Web search ranking [A ] . SIGIR [C ] . 2009 . 355 - 362 .
GUO F , LIU C , KANNAN A , et al . Click chain model in Web search [A ] . WWW [C ] . 2009 . 11 - 20 .
AGICHTEIN E , BRILL E , DUMAIS S . Improving Web search ranking by incorporating user behavior information [A ] . SIGIR [C ] . 2006 . 19 - 26 .
DROSOU M , PITOURA E . YmalDB:exploring relational databases via result-driven recommendations [J ] . Proceedings of the Very Large Data Bases Endowment , 2013 , 22 ( 6 ): 849 - 874 .
SCHMEIER S . Exploratory search on mobile devices [D ] . German Research Center for Artificial Intelligence and Saarland University , 2013 .
PAPADAKOS P , TZITZIKAS Y . Hippalus:preference-enriched faceted exploration [A ] . EDBT/ICDT Workshops [C ] . 2014 . 167 - 172 .
TAUHEED F , HEINIS T , SCHURMANN F , et al . SCOUT:prefetching for latent structure following queries [J ] . Proceedings of the Very Large Data Bases Endowment , 2012 , 5 ( 11 ): 1531 - 1542 .
SIDIROURGOS L , KERSTEN M L , BONCZ PA . Scientific discovery through weighted sampling [A ] . Big Data Conference [C ] . 2013 . 300 - 306 .
SIDIROURGOS L , KERSTEN M L , BONCZ P A . SciBORQ:scientific data management with bounds on runtime and quality [A ] . Biennial Conference on Innovative Data Systems Research (CIDR) [C ] . 2011 . 296 - 301 .
ACHARYA S , GIBBONS P , POOSALA V , et al . The aqua approximate query answering system [A ] . SIGMOD [C ] . 1999 . 574 - 576 .
AGARWAL S , MILNER H , KLEINER A , et al . Knowing when you're wrong:building fast and reliable approximate query processing systems [A ] . SIGMOD [C ] . 2014 . 481 - 492 .
AGARWAL S , MOZAFARI B , PANDA A , et al . BlinkDB:queries with bounded errors and bounded response times on very large data [A ] . EuroSys [C ] . 2013 . 29 - 42 .
HOFFART J , SUCHANEK F , BERBERICH K , et al . YAGO2:exploring and querying world knowledge in time,space,context,and many languages [A ] . WWW [C ] . 2011 . 229 - 232 .
RDF model and syntax specification [S ] . 1999 .
DU F , CHEN Y G , DU X Y . Survey of RDF query processing techniques [J ] . Journal of Software , 2013 , 24 ( 6 ): 1222 - 1242 .
MALEWICZ G , AUSTERN M , BIK A , et al . Pregel:a system for large-scale graph processing [A ] . SIGMOD [C ] . 2010 . 135 - 146 .
LOW Y C , GONZALEZ J , KYROLA A , et al . Distributed GraphLab:a framework for machine learning in the cloud [J ] . Proceedings of the Very Large Data Bases Endowment , 2012 , 5 ( 8 ): 716 - 727 .
GONZALEZ J E , XIN RS , DAVE A , et al . GraphX:graph processing in a distributed dataflow framework [A ] . OSDI [C ] . 2014 . 599 - 613 .
SHAO B , WANG H , LI Y . Trinity:a distributed graph engine on a memory cloud [A ] . SIGMOD [C ] . 2013 . 505 - 516 .
CHANG L , WANG ZW , M A T , et al . HAWQ:a massively parallel processing SQL engine in hadoop [A ] . SIGMOD [C ] . 2015 . 1223 - 1234 .
LI J Z , GAO H , LUO J Z , et al . InfiniteDB:a pc-cluster based parallel massive database management system [A ] . SIGMOD [C ] . 2007 . 899 - 909 .
Cloudera Impala [EB/OL ] . http://www.cloudera.com/ http://www.cloudera.com/ .
DIACONU C , FREEDMAN C , ISMERT E , et al . Hekaton:SQL server‘s memory-optimized OLTP engine [A ] . SIGMOD [C ] . 2013 . 1243 - 1254 .
SAP HANA [EB/OL ] . http://www.saphana.com/ http://www.saphana.com/ .
MonetDB [EB/OL ] . http://www.monetdb.org/ http://www.monetdb.org/ .
ANTOVA L , EL-HELW A , SOLIMAN M , et al . Optimizing queries over partitioned tables in MPP systems [A ] . SIGMOD [C ] . 2014 . 373 - 384 .
VALIANT L . A bridging model for parallel computation [J ] . Communication on ACM , 1990 , 33 ( 8 ): 103 - 111 .
0
浏览量
1075
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构