浏览全部资源
扫码关注微信
北京邮电大学计算机学院(国家示范性软件学院),北京 100876
[ "张笑燕(1973- ),女,山东烟台人,博士,北京邮电大学教授,主要研究方向为软件工程理论、移动互联网软件与大数据分析" ]
[ "刘志浩(1996- ),男,山东临沂人,北京邮电大学硕士生,主要研究方向为大数据分析、移动与互联网软件" ]
[ "杜晓峰(1973- ),男,陕西韩城人,北京邮电大学讲师,主要研究方向为云计算与大数据分析" ]
[ "陆天波(1977- ),男,贵州毕节人,博士,北京邮电大学教授,主要研究方向为网络与信息安全、安全软件工程和P2P计算" ]
网络出版日期:2022-04,
纸质出版日期:2022-04-25
移动端阅览
张笑燕, 刘志浩, 杜晓峰, 等. 流数据实时接收方案的研究[J]. 通信学报, 2022,43(4):154-163.
Xiaoyan ZHANG, Zhihao LIU, Xiaofeng DU, et al. Research on a real-time receiving scheme of streaming data[J]. Journal on communications, 2022, 43(4): 154-163.
张笑燕, 刘志浩, 杜晓峰, 等. 流数据实时接收方案的研究[J]. 通信学报, 2022,43(4):154-163. DOI: 10.11959/j.issn.1000-436x.2022080.
Xiaoyan ZHANG, Zhihao LIU, Xiaofeng DU, et al. Research on a real-time receiving scheme of streaming data[J]. Journal on communications, 2022, 43(4): 154-163. DOI: 10.11959/j.issn.1000-436x.2022080.
针对现代数据仓库系统中常见的需接收大量流数据,且其与磁盘上已有的数据做连接后再入库的场景进行了探讨。通过合理设置磁盘分页和应用缓存模块,分散磁盘I/O压力,在已有研究的基础上提出了一种具有更高效率的数据接收方案,并引入一致性哈希函数将其扩展到分布式环境,提出一种应用于分布式环境的D-CACHEJOIN算法。通过理论计算算法的成本模型,并使用服从Zipfian分布的数据进行模拟实验。实验结果表明,在接近现实的实际应用场景下,所提算法拥有比现有算法更高的效率,同时能够快速方便地扩展到分布式环境。
Discussing the common scenarios in modern data warehouse systems that need to receive a large amount of streaming data
connect it with the existing data on the disk
and then store it in the warehouse.By rationally setting disk paging and applying cache modules to disperse the disk I/O pressure
a more efficient data receiving scheme was proposed based on the existing research
and a consistent Hash function was introduced and extended to distributed environment and a D-CACHEJOIN algorithm applied to distributed environment was proposed.The cost model of the algorithm was calculated by theory and simulation experiment was performed using data that obey the Zipfian distribution.The experiment results show that the proposed algorithm has higher efficiency than existing algorithms in practical application scenarios close to reality
and can be quickly and easily extended to distributed environments.
POLYZOTIS N , SKIADOPOULOS S , VASSILIADIS P , et al . Supporting streaming updates in an active data warehouse [C ] // Proceedings of 2007 IEEE 23rd International Conference on Data Engineering . Piscataway:IEEE Press , 2007 : 476 - 485 .
林子雨 , 林琛 , 冯少荣 , 等 . MESHJOIN*:实时数据仓库环境下的数据流更新算法 [J ] . 计算机科学与探索 , 2010 , 4 ( 10 ): 927 - 939 .
LIN Z Y , LIN C , FENG S R , et al . MESHJOIN*:an algorithm supporting streaming updates in a real-time data warehouse [J ] . Journal of Frontiers of Computer Science & Technology , 2010 , 4 ( 10 ): 927 - 939 .
KIM H J , LEE K H . Semi-stream similarity join processing in a distributed environment [J ] . IEEE Access , 2020 , 8 : 130194 - 130204 .
熊超 . 多路数据流等值连接中独立元素问题的研究 [D ] . 深圳:中国科学院大学(中国科学院深圳先进技术研究院) , 2020 .
XIONG C . The distinct element problem in equi-join for multiple data streams [D ] . Shenzhen:Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences , 2020 .
魏星贝 , 李陶深 , 许嘉 , 等 . QJoin:质量驱动的乱序数据流连接处理技术 [J ] . 广西科学 , 2020 , 27 ( 3 ): 266 - 275 .
WEI X B , LI T S , XU J , et al . QJoin:quality-driven join processing technique over out-of-order data streams [J ] . Guangxi Sciences , 2020 , 27 ( 3 ): 266 - 275 .
POLYZOTIS N , SKIADOPOULOS S , VASSILIADIS P , et al . Meshing streaming updates with persistent data in an active data warehouse [J ] . IEEE Transactions on Knowledge and Data Engineering , 2008 , 20 ( 7 ): 976 - 991 .
ZULFIKAR A F , LESLIE HENDRIC SPITS WARNAS H , GAOL F L , et al . Query optimization for distributed databases uses a semi-join based approach (SBA) with the SDD-1 algorithm [C ] // Proceedings of 2019 International Conference on Information and Communications Technology (ICOIACT) . Piscataway:IEEE Press , 2019 : 619 - 623 .
VAIDEHI V , DEVI D S . Distributed database management and join of multiple data streams in wireless sensor network using querying techniques [C ] // Proceedings of 2011 International Conference on Recent Trends in Information Technology (ICRTIT) . Piscataway:IEEE Press , 2011 : 583 - 588 .
陈付梅 , 韩德志 , 毕坤 , 等 . 大数据环境下的分布式数据流处理关键技术探析 [J ] . 计算机应用 , 2017 , 37 ( 3 ): 620 - 627 .
CHEN F M , HAN D Z , BI K , et al . Key technologies of distributed data stream processing based on big data [J ] . Journal of Computer Applications , 2017 , 37 ( 3 ): 620 - 627 .
ABDELHAFIZ B M , . Distributed database using sharding database architecture [C ] // Proceedings of 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering . Piscataway:IEEE Press , 2020 : 1 - 17 .
NAEEM M A , DOBBIE G , WEBER G . A lightweight stream-based join with limited resource consumption [C ] // Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery . Berlin:Springer , 2012 : 431 - 442 .
JEON Y H , LEE K H , KIM H J . Distributed join processing between streaming and stored big data under the micro-batch model [J ] . IEEE Access , 2019 , 7 : 34583 - 34598 .
YUAN J , WANG Y H , CHEN H H , et al . Eunomia:efficiently eliminating abnormal results in distributed stream join systems [C ] // Proceedings of 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS) . Piscataway:IEEE Press , 2021 : 1 - 11 .
狄程 , 杨中国 , 韩燕波 , 等 . 面向流数据的实时处理及服务化系统 [J ] . 重庆大学学报 , 2020 , 43 ( 7 ): 75 - 83 .
DI C , YANG Z G , HAN Y B , et al . View-driven flow data oriented real-time processing and service system [J ] . Journal of Chongqing University , 2020 , 43 ( 7 ): 75 - 83 .
DAVID K , KEHMAN E , LEIGHTON T , et al . Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the world wide web [C ] // Proceedings of the 29th ACM Symposium on Theory of Computing . New York:ACM Press , 1997 : 654 - 663 .
KNUTH D E . The art of computer programming,vol.3:sorting and searching,2nd edition [M ] . Redwood City : Addison Wesley Longman Publishing Co.,Inc. , 1998 .
ARMSTRONG R . The long tail:why the future of business is selling less of more [J ] . Canadian Journal of Communication , 2008 , 33 ( 1 ): 274 - 276 .
WORRELL J , . Real-time model checking:algorithms and complexity [C ] // Proceedings of 2008 15th International Symposium on Temporal Representation and Reasoning . Piscataway:IEEE Press , 2008 :19.
NAEEM M A , BAJWA I S , JAMIL N . A cached-based approach to enrich Stream data with master data [C ] // Proceedings of 2015 Tenth International Conference on Digital Information Management (ICDIM) . Piscataway:IEEE Press , 2015 : 57 - 62 .
NIAN H , CHEN L , XU Y Y , et al . Sequences domain impedance modeling of three-phase grid-connected converter using harmonic transfer matrices [J ] . IEEE Transactions on Energy Conversion , 2018 , 33 ( 2 ): 627 - 638 .
MOTWANI R , VASSILVITSKII S . Distinct values estimators for power law distributions [C ] // Proceedings of the 3rd Workshop on Analytic Algorithmics and Combinatorics (ANALCO) . Philadelphia:Society for Industrial and Applied Mathematics , 2006 : 230 - 237 .
0
浏览量
422
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构