浏览全部资源
扫码关注微信
1. 中国科学院信息工程研究所,北京 100093
2. 国家计算机网络应急技术处理协调中心,北京 100029
3. 中国科学院大学,北京100049
4. 北京邮电大学信息与通信工程学院,北京 100876
[ "邹学强(1978-),男,福建莆田人,中国科学院信息工程研究所博士生,主要研究方向为信息处理、信息安全、网络流量分析等。" ]
[ "包秀国(1962-),男,江苏如皋人,博士,中国科学院信息工程研究所教授、博士生导师,主要研究方向为信息网络安全、音视频处理、网络空间测绘等。" ]
[ "黄晓军(1990-),男,江西九江人,北京邮电大学硕士生,主要研究方向为数据挖掘、信息安全。" ]
[ "马宏远(1981-),男,辽宁朝阳人,博士,国家计算机网络应急技术处理协调中心高级工程师,主要研究方向为智能信息处理。" ]
[ "袁庆升(1980-),男,山东济南人,中国科学院信息工程研究所博士生,主要研究方向为多媒体大数据处理、网络与信息安全。" ]
网络出版日期:2016-12,
纸质出版日期:2016-12-25
移动端阅览
邹学强, 包秀国, 黄晓军, 等. 基于层次分析的微博短文本特征计算方法[J]. 通信学报, 2016,37(12):50-55.
Xue-qiang ZOU, Xiu-guo BAO, Xiao-jun HUANG, et al. Calculating the feature method of short text based on analytic hierarchy process[J]. Journal on communications, 2016, 37(12): 50-55.
邹学强, 包秀国, 黄晓军, 等. 基于层次分析的微博短文本特征计算方法[J]. 通信学报, 2016,37(12):50-55. DOI: 10.11959/j.issn.1000-436x.2016239.
Xue-qiang ZOU, Xiu-guo BAO, Xiao-jun HUANG, et al. Calculating the feature method of short text based on analytic hierarchy process[J]. Journal on communications, 2016, 37(12): 50-55. DOI: 10.11959/j.issn.1000-436x.2016239.
为了建立用户精准兴趣模型以有效发现具有相似兴趣的用户群,提出了一种针对微博的短文本特征计算方法用于聚类算法,提升聚类效果以更好地挖掘微博用户的相似兴趣集合。该方法融合了微博转发数、评论数、点赞数等多个关键指标来度量微博短文本特征的重要性。同时,引入层次分析技术,改进了传统的tf-idf特征计算方法,并利用经典文本聚类算法进行实验。实验结果表明,改进后的短文本特征计算方法与传统的tf-idf特征计算方法相比,在类内集中度和类间分散度上取得了更好的效果。
In order to model the accurate interest preference of microblog users and discover user groups with similar in-terest
a new method was proposed which considered the total amount of retweets
comments and attitudes of each mi-croblog for text feature calculation with utilizing classic analytical hierarchy process method. The proposed method used three indicators to evaluate the importance of the text feature representation and made an improvement on traditional tf-idf feature calculation method to fit for short text. Furthermore
this method was also implemented in the traditional clustering algorithm. Experimental results show that
compared with the traditional tf-idf method
the improved approach has a better clustering effect on the average scattering for clusters and the total separation between clusters.
AMR A , LIANG J H , ALEXANDER J S . Hierarchical geographical modeling of user locations from social media posts [C ] // The 22nd In-ternational Conference on World Wide Web , 2013 : 25 - 36 .
DAVID J . That's what friends are for: inferring location in online social media platforms based on social relationships [C ] // The 7th In-ternational Conference on Weblogs and Social Media . 2013 : 273 - 282 .
BOLLEGALA D , MATSUO Y , ISHIZUKA M . Measuring the similarity between implicit semantic relation using web search en-gines [C ] // The 2nd ACM International Conference on Web Search and Data Mining WSDM'09 , 2009 : 104 - 113 .
SUN A . Short text classification using very few words [C ] // The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. NewYork, USA , 2012 : 1145 - 1146 .
RAMGE D , DUMAIS S , LIEBLINGI D . Characterizing microblogs with topic models [C ] // ICWSM , 2010 : 130 - 137 .
WENG J S , LIM E P , JIANG J , et al . TwitterRank: finding topic-sensitive influential Twitterers [C ] // The 3th ACM International Con-ference on Web Search and Data Mining. New York City , 2010 : 261 - 270 .
ABEL F , GAO Q , HOU B G J , et al . Semantic enrichment of twitter posts for user profile construction on the social Web [C ] // The 8th Ex-tended Semantic Web Conference on the Semanic Web: Research and Pages (ESWC'11) , 2011 : 375 - 389 .
WELCH M J , SCHONFELD U , HE D , et al . Topical semantics of Twitter links [C ] // The 4th ACM International Conference on Web Search and Data Mining (WSDM'11) . 2011 : 327 - 336 .
LIU Z , CHEN X , SUN M . Mining the interests of Chinese microblog-gers via keyword extraction [J ] . Frontiers of Computer Science in China , 2012 , 6 ( 1 ): 76 - 87 .
邱云飞 , 王琳颍 , 邵良杉 , 等 . 基于微博短文本的用户兴趣建模方法 [J ] . 计算机工程 , 2014 , 40 ( 2 ): 275 - 279 .
QIU Y F , WANG L Y , SHAO L S , et al . User interest modeling ap-proach based on short text of micro-blog [J ] . Computer Engineering , 2014 , 40 ( 2 ): 275 - 279 .
宋巍 , 张宇 , 谢毓彬 , 等 . 基于微博分类的用户兴趣识别 [J ] . 智能计算机与应用 , 2013 , 3 ( 4 ): 80 - 83 .
SONG W , ZHANG Y , XIE Y B , et al . Identifying user interests based on microblog classification [J ] . Intelligent Computer and Applications , 2013 , 3 ( 4 ): 80 - 83 .
方维 . 微博兴趣识别与推送系统的研究与实现 [D ] . 华中科技大学 , 2012 .
FANG W . Research and implement of micro-blog interest found and pushing system [D ] . Huazhong University of Science and Technology , 2012 .
张俊林 . 标签传播算法在微博用户兴趣图谱的应用 [J ] . 程序员 , 2012 , 1 ( 7 ): 50 - 53 .
ZHANG J L . Application of label propagation algorithm in user pro-files of micro-blog [J ] . Programmer , 2012 , 1 ( 7 ): 50 - 53 .
SALTON G , WONG A , YANG C S . A vector space model for auto-matic indexing [J ] . Communications of the ACM CACM Homepage , 1975 , 18 ( 11 ): 613 - 620 .
常建娥 , 蒋太立 . 层次分析法确定权重的研究 [J ] . 武汉理工大学学报(信息与管理工程版) , 2007 , 29 ( 1 ): 153 - 156 .
CHANG J E , JIANG T L . Research on determining weights by ana-lytic hierarchy process [J ] . Journal of Wuhan University of Technology (Information & Management Engineering) , 2007 , 29 ( 1 ): 153 - 156 .
CHANG C C , LIN C J . LIBSVM: a library for support vector ma-chines [J ] . ACM Transactions on Intelligent Systems & Technology , 2011 , 2 ( 3 ): 389 - 396 .
HALKIDI M , VAZIRGIANNIS M , BATISTAKIS Y . Quality scheme assessment in the clustering process [J ] . Lecture Notes in Computer Science , 2000 , 1910 ( 1 ): 265 - 276 .
0
浏览量
1355
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构