浏览全部资源
扫码关注微信
1. 中国民航大学计算机科学与技术学院,天津 300300
2. 清华大学网络科学与网络空间研究院,北京 100084
3. 清华信息科学与技术国家实验室,北京 100084
4. 北京邮电大学网络技术研究院,北京 100876
5. 北京航空航天大学虚拟现实技术与系统国家重点实验室,北京 100876
[ "张宇翔(1975-),男,山西五寨人,博士,中国民航大学副教授,主要研究方向为社会网络分析、推荐技术。" ]
[ "孙菀(1991-),女,山东烟台人,中国民航大学硕士生,主要研究方向为社会网络分析与推荐技术。" ]
[ "杨家海(1966-),男,浙江云和人,清华大学教授、博士生导师,主要研究方向为计算机网络管理与测量、云计算与大数据等。" ]
[ "周达磊(1992-),男,江苏连云港人,北京邮电大学硕士生,主要研究方向为网络分析。" ]
[ "孟祥飞(1993-),男,山西太原人,北京航空航天大学硕士生,主要研究方向为数据分析技术。" ]
[ "肖春景(1978-),女,河北唐山人,中国民航大学讲师,主要研究方向为数据挖掘与推荐系统。" ]
网络出版日期:2016-08,
纸质出版日期:2016-08-25
移动端阅览
张宇翔, 孙菀, 杨家海, 等. 新浪微博反垃圾中特征选择的重要性分析[J]. 通信学报, 2016,37(8):24-33.
Yu-xiang ZHANG, Yu SUN, Jia-hai YANG, et al. Feature importance analysis for spammer detection in Sina Weibo[J]. Journal on communications, 2016, 37(8): 24-33.
张宇翔, 孙菀, 杨家海, 等. 新浪微博反垃圾中特征选择的重要性分析[J]. 通信学报, 2016,37(8):24-33. DOI: 10.11959/j.issn.1000-436x.2016152.
Yu-xiang ZHANG, Yu SUN, Jia-hai YANG, et al. Feature importance analysis for spammer detection in Sina Weibo[J]. Journal on communications, 2016, 37(8): 24-33. DOI: 10.11959/j.issn.1000-436x.2016152.
微博中的垃圾用户非常普遍,其异常行为及生产的垃圾信息显著降低了用户体验。为了提高识别准确率,已有研究或是尽可能多地定义特征,或是不断尝试提出新的分类检测方法;那么,微博反垃圾问题的突破点优先置于寻找分类特征还是改进分类检测方法,是否特征越多检测效果越好,新的方法是否可以显著提高检测效果。以新浪微博为例,试图通过不同的特征选择方法与不同的分类器组合实验回答以上问题,实验结果表明特征组的选择较分类器的改进更为重要,需从内容信息、用户行为和社会关系多侧面生成特征,且特征并非越多检测效果越好,这些结论将有助于未来微博反垃圾工作的突破。
Microblog has drawn attention of not only legitimate users but also spammers.The garbage information pro-vided by spammers handicaps users' experience significantly.In order to improve the detection accuracy of spammers
most existing studies on spam focus on generating more classification features or putting forward new classifiers.Which kind of issues would be put the high priority of an enormous amount of research effort into? Are extensive features or novel classifiers better for the detection accuracy of spammers? It is tried to address these questions through combining different feature selection methods with different classifiers on a real Sina Weibo dataset.Experimental results show that selected features are more important than novel classifiers for spammer detection.In addition
features should be derived from a wide range
such as text contents
user behaviors
and social relationship
and the dimension of features should not be too high.These results will be useful in finding the breakpoint of Microblog anti-spam works in the future.
Available online [EB/OL ] . http://news.xinhuanet.com/2013-07/04/c_116410610.htm. http://news.xinhuanet.com/2013-07/04/c_116410610.htm.
Available online [EB/OL ] . http://it.people.com.cn/n/2015/0212/c1009-26552746.html. http://it.people.com.cn/n/2015/0212/c1009-26552746.html.
SPIRIN N , HAN J W . Survey on web spam detection:principles and algorithms [J ] . ACM SIGKDD Explorations Newsletter , 2012 , 13 ( 2 ): 50 - 64 .
MUKHERJEE A , LIU B , GLANCE N S . Spotting fake reviewer groups in consumer reviews [C ] // The WWW . c 2012 : 191 - 200 .
WANG T Y , WANG G , LI X . Characterizing and detecting malicious crowdsourcing [C ] // The ACM SIGCOMM . c 2013 : 537 - 538 .
WANG G , WILSON C , ZHAO X H . Serf and turf:crowdturfing for fun and profit [C ] // The WWW . c 2012 : 679 - 688 .
SRIDHARAN V , SHANKAR V , GUPTA M . Twitter games:how successful spammers pick targets [C ] // The ACSAC . c 2012 : 389 - 398 .
STRINGHINI G , KRUEGEL C , VIGNA G . Detecting spammers on social networks [C ] // The ACSAC . c 2012 : 1 - 9 .
IRANI D , WEBB S , PU C . Study of static classification of social spam profiles in MySpace [C ] // The ICWSM . c 2010 : 82 - 89 .
GAO H Y , HU J , WILSON C . Detecting and characterizing social spam campaigns [C ] // The CCS . c 2010 : 681 - 683 .
AGGARWAL A , ALMEIDA J M , KUMARAGURU P . Detection of spam tipping behaviour on foursquare [C ] // The WWW . c 2013 : 641 - 648 .
GAO Q , ABEL F , HOUBEN G J . A comparative study of user's mi-croblogging behavior on Sina weibo and Twitter [C ] // The 20th Interna-tional Conference on User Modeling . c 2012 : 88 - 101 .
YU L , ASUR S , HUBERMAN BA . What trends in Chinese social media [C ] // SNA-KDD Workshop . c 2011 : 1 - 10 .
YU LL , ASUR S , HUBERMAN BA . Artificial inflation:the real story of trends and trend-setters in Sina weibo [C ] // The International Confernece on Social Computing . c 2012 : 514 - 519 .
樊鹏翼 , 王晖 , 姜志宏 , 等 . 微博网络测量研究 [J ] . 计算机研究与发展 , 2012 , 49 ( 4 ): 691 - 699 .
FAN P Y , WANG H , JIANG Z H , et al . Measurement of microblog-ging network [J ] . Journal of Computer Research Development , 2012 , 49 ( 4 ): 691 - 699 .
SHARMA P , BISWAS S . Identifying spam in Twitter trending topics.technical report [R ] . USC(University of Southern California) Informa-tion Sciences Institute , 2011 . 1 - 4 .
BENEVENUTO F , MAGNO G , RODRIGUES T . Detecting spammers on Twitter [C ] // The 7th Collaboration,Electronic messaging,Anti-Abuse and Spam Conference . c 2010 : 1 - 9 .
HASTIE T , TIBSHIRANI R . DISCRIMINANT adaptive nearest neighbor classification [J ] . IEEE Trans.on Pattern Analysis and Ma-chine Intelligence , 1996 , 18 ( 6 ): 607 - 616 .
FREUND Y , SCHAPIRE RE . A decision-theoretic generalization of on-line learning and an application to boosting [J ] . Journal of Com-puter and System Sciences , 1997 , 55 ( 1 ): 119 - 139 .
ORR M J L . Regularization in the selection of radial basis function centres [J ] . Neural Computation , 1995 , 7 ( 3 ): 606 - 623 .
HO T K . The random subspace method for constructing decision forests [J ] . IEEE Trans.on Pattern Analysis and Machine Intelligence , 1998 , 20 ( 8 ): 832 - 844 .
MILLER Z , DICKINSON B , DEITRICK W , et al . Twitter spammer detection using data stream clustering [J ] . Information Sciences , 2014 , 260 ( 1 ): 64 - 73 .
SHOBEIR F , JAMES F , MADHUSHDANA S , et al . Collective spam-mer detection in evolving multi-relation social networks [C ] // The KDD . c 2015 : 1769 - 1778 .
WANG A H . Detecting spam bots in online social networking sites:a machine learning approach [C ] // DBSec . c 2010 : 335 - 342 .
LEE K , CAVERLEE J , WEBB S , et al . Uncovering social spammers:social honeypots+machine learning [C ] // The SIGIR . c 2010 : 435 - 442 .
MARTINEZ R J , ARAUJO L . Detecting malicious tweets in trending topics using a statistical analysis of language [J ] . Expert Systems with Applications , 2013 , 40 ( 8 ): 2992 - 3000 .
ZHU Y , WANG X , ZHONG E H . Discovering spammers in social networks [C ] // The AAAI . c 2012 : 1 - 7 .
HU X , TANG J L , GAO HJ , et al . Social spammer detection with sentiment information [C ] // The ICDM . c 2014 : 180 - 189 .
TAN E , GUO L , CHEN S , et al . Unik:unsupervised social network spam detection [C ] // The ICDM . c 2013 : 479 - 488 .
ZHANG X , ZHU S , LIANG W . Detecting spam and promoting cam-paigns in the twitter social network [C ] // The ICDM . c 2012 : 1194 - 1199 .
SURENDRA S , AIXIN S . HSpam14:a collection of 14 million tweets for hashtag-oriented spam research [C ] // The SIGIR . c 2015 : 9 - 13 .
YANG C , HARKREADER R C , ZHANG J . Analyzing spammers' social networks for fun and profit:a case study of cyber criminal eco-system on twitter [C ] // The WWW . c 2012 : 71 - 80 .
HU X , TANG J L , LIU H . Online social spammer detection [C ] // The AAAI . c 2014 : 1 - 7 .
HU X , TANG J L , ZHANG Y C , et al . Social spammer detection in microblogging [C ] // The IJCAI . c 2013 : 177 - 188 .
CASTILLO C , MENDOZA M , POBLETE B . Information credibility on twitter [C ] // The WWW . c 2011 : 675 - 684 .
RATKIEWICZ J , CONOVER M , MEISS M . Detecting and tracking political abuse in social media [C ] // The ICWSM . c 2011 : 1 - 8 .
丁兆云 , 周斌 , 贾焰 , 等 . 微博中基于统计特征与双向投票的垃圾用户发现 [J ] . 计算机研究与发展 , 2013 , 50 ( 11 ): 2336 - 2348 .
DING Z Y , ZHOU B , JIA Y , , et al . Detecting spammers with a bidirec-tional vote algorithm based on statistical features in microblogs [J ] . Journal of Computer Research and Development , 2013 , 50 ( 11 ): 2336 - 2348 .
HU X , TANG J L , ZHANG Y C , LIU H . Leveraging knowledge across media for spammer detection in microblogging [C ] // The ACM SIGIR . c 2014 : 547 - 556 .
Available online [EB/OL ] . http://ictclas.nlpir.org/. http://ictclas.nlpir.org/.
DASH M , LIU H . Feature selection for classifications [J ] . Intelligent Data Analysis , 1997 , 16 ( 21 ): 131 - 156 .
LIU H , SETIONO R . CHI2:feature selection and discretization of numeric attributes [C ] // The ICTAI . c 1995 : 338 - 391 .
NOWOZIN S . Estimating attributes:analysis and extensions of RELIEF [C ] // The ECML-PKDD . c 2012 : 1 - 8 .
KONONENKO I . Estimating attributes:analysis and extensions of RELIEF [C ] // The ECML-PKDD . c 1994 : 171 - 182 .
GUYON I , WESTON J , BARNHILL SMD . Gene selection for cancer classification using support vector machines [J ] . Machine Learning , 2002 , 46 ( 1-3 ): 389 - 422 .
STECK J B . Netpix:a method of feature selection leading to accurate sentiment-based classification models [D ] . Central Connecticut State University , 2005 .
HALL M A . Correlation-based feature selection for discrete and nu-meric class machine learning [C ] // The ICML . c 2000 : 359 - 366 .
JOHN GH , EDU S , LANGLEY P . Estimating continuous distributions in Bayesian classifiers [C ] // The UAI . c 1995 : 338 - 345 .
KEERTHI S S , DUAN K , SHEVADE S K . A fast dual algorithm for kernel logistic regression [J ] . Machine Learning , 2005 , 61 ( 1 ): 151 - 165 .
CORTES C , VAPNIK V N . Support-vector networks [J ] . Machine Learning , 1995 , 20 ( 3 ): 273 - 297 .
ORR M J L . Regularization in the selection of radial basis function centres [J ] . Neural Computation , 1995 , 7 ( 3 ): 606 - 623 .
BREIMAN L . Bagging predictors [J ] . Machine Learning , 1996 , 24 ( 2 ): 123 - 140 .
QUINLAN J R . C4.5:programs for machine learning [M ] . Morgan Kaufmann Publishers , San Mateo,California , 1993 .
LANDWEHR N , HALL M , FRANK E . Logistic model trees [J ] . Ma-chine Learning , 2005 , 59 ( 1 ): 161 - 205 .
KOHAVI R . A study of cross-validation and bootstrap for accuracy estimation and model selection [C ] // The IJCAI . c 1995 : 1137 - 1143 .
0
浏览量
1
下载量
5
CSCD
关联资源
相关文章
相关作者
相关机构