Performance analysis of topic detection algorithms in distributed environment
Papers|更新时间:2024-06-05
|
Performance analysis of topic detection algorithms in distributed environment
Journal on CommunicationsVol. 39, Issue 8, Pages: 176-184(2018)
作者机构:
1. 国防科技大学计算机学院,湖南 长沙 410073
2. 北京邮电大学计算机学院,北京 100876
作者简介:
基金信息:
The National Natural Science Foundation of China(61502517);The National Natural Science Foundation of China(61472433);The National Natural Science Foundation of China(61732004);The National Natural Science Foundation of China(61732022);The National Key Research and Development Program of China(0708068118002);The National Key Research and Development Program of China(2017YFB0803303)
Lu DENG, Yan JIA, Binxing FANG, et al. Performance analysis of topic detection algorithms in distributed environment[J]. Journal on Communications, 2018, 39(8): 176-184.
DOI:
Lu DENG, Yan JIA, Binxing FANG, et al. Performance analysis of topic detection algorithms in distributed environment[J]. Journal on Communications, 2018, 39(8): 176-184. DOI: 10.11959/j.issn.1000-436x.2018136.
Performance analysis of topic detection algorithms in distributed environment
therefore more and more people choose social network to express their views and feelings.Quickly find what people are talking about in big data gets more and more attention.And a lot of related methods of topic detection spring up in this situation.The performance analysis project was proposed based on the characteristics of social network.According to the project
the performances of some typical topic detection algorithms were tested and compared in large-scale data of Sina Weibo.What’s more
the advantages and disadvantages of these algorithms were pointed out so as to provide references for later applications.
关键词
Keywords
references
中国互联网络信息中心 第41次《中国互联网络发展状况统计报告》 [R ] . 2018 .
China Internet Network Information Center The 41th statistical report on Internet development in China [R ] . 2018 .
DHILLON I S , MODHA D S . Concept decompositions for large sparse text data using clustering [C ] // Machine Learning . 2001 : 143 - 175 .
KUMMAMURU K , DHAWALE A , KRISHNAPURAM R . Fuzzy co-clustering of documents and keywords [C ] // The IEEE International Conference on Fuzzy Systems . 2003 : 772 - 777 .
ZHAO Y , KARYPIS G . Soft clustering criterion functions for partitional document clustering:a summary of results [C ] // Thirteenth ACM International Conference on Information & Knowledge Management . 2004 : 246 - 247 .
MAKKONEN J , AHONENMYKA H , SALMENKIVI M . Topic detection and tracking with spatio-temporal evidence [C ] // European Conference on Ir Research . 2003 : 251 - 265 .
WU C , WANG B . Extracting topics based on Word2Vec and improved jaccard similarity coefficient [C ] // IEEE Second International Conference on Data Science in Cyberspace . 2017 : 389 - 397 .
HOFMANN T , . Probabilistic latent semantic indexing [C ] // International ACM SIGIR Conference on Research and Development in Information Retrieval . 1999 : 50 - 57 .
BLEI D M , NG A Y , JORDAN M I . Latent dirichlet allocation [J ] . J Machine Learning Research Archive , 2003 , 3 : 993 - 1022 .
STEYVERS M , GRIFFITHS T . Probabilistic topic models [J ] . Handbook of Latent Semantic Analysis , 2007 , 427 ( 7 ): 424 - 440 .
BLEI D , CARIN L , DUNSON D . Probabilistic topic models [C ] // ACM SIGKDD International Conference Tutorials . 2011 :1.
BERNHARD S , JOHN P , THOMAS H . A collapsed variational bayesian inference algorithm for latent dirichlet allocation [C ] // The Twentieth Conference on Neural Information Processing Systems . 2006 : 1353 - 1360 .
GRIFFITHS T L , STEYVERS M . Finding scientific topics [J ] . National Academy of Sciences of the United States of America , 2004 : 5228 - 5235 .
RAMAGE D , . Characterizing microblogs with topic models [C ] // International AAAI Conference on Weblogs and Social Media . 2010 : 130 - 137 .
CHEN Z , LIU B . Mining topics in documents:standing on the shoulders of big data [C ] // ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2014 : 1116 - 1125 .
LIN T , TIAN W , MEI Q , et al . The dual-sparse topic model:mining focused topics and focused terms in short text [C ] // International Conference on World Wide Web . 2014 : 539 - 550 .
ZHAI K , BOYD G J , ASADI N , et al . MrLDA:a flexible large scale topic modeling package using variational inference in MapReduce [C ] // International Conference on World Wide Web . 2012 : 879 - 888 .
ARONSSON F . Large scale cluster analysis with Hadoop and Mahout [J ] . Technology & Engineering , 2015 .
MENG X R , BRADLEY J , BURAK Y , et al . MLlib:machine learning in apache spark [J ] . Journal of Machine Learning Research , 2015 , 17 ( 1 ): 1235 - 1241 .