浏览全部资源
扫码关注微信
1. 燕山大学信息科学与工程学院,河北 秦皇岛 066004
2. 河北省特种光纤与光纤传感重点实验室,河北 秦皇岛 066004
3. 燕山大学里仁学院,河北 秦皇岛 066004
[ "刘浩然(1980−),男,黑龙江哈尔滨人,燕山大学教授、博士生导师,主要研究方向为无线传感网络、工业故障检测。" ]
[ "丁攀(1992−),男,云南宣威人,燕山大学硕士生,主要研究方向为贝叶斯网络、文本分类。" ]
[ "郭长江(1980−),男,河北肃宁人,河北省特种光纤与光纤传感重点实验室研究实习员,主要研究方向为贝叶斯网络、计算机网络、故障诊断。" ]
[ "常金凤(1993−),女,河北保定人,燕山大学硕士生,主要研究方向为贝叶斯网络。" ]
[ "崔静闯(1994−),男,河北邯郸人,燕山大学硕士生,主要研究方向为粒子群、贝叶斯网络。" ]
网络出版日期:2018-12,
纸质出版日期:2018-12-25
移动端阅览
刘浩然, 丁攀, 郭长江, 等. 基于贝叶斯算法的中文垃圾邮件过滤系统研究[J]. 通信学报, 2018,39(12):151-159.
Haoran LIU, Pan DING, Changjiang GUO, et al. Study on Chinese spam filtering system based on Bayes algorithm[J]. Journal on communications, 2018, 39(12): 151-159.
刘浩然, 丁攀, 郭长江, 等. 基于贝叶斯算法的中文垃圾邮件过滤系统研究[J]. 通信学报, 2018,39(12):151-159. DOI: 10.11959/j.issn.1000−436x.2018281.
Haoran LIU, Pan DING, Changjiang GUO, et al. Study on Chinese spam filtering system based on Bayes algorithm[J]. Journal on communications, 2018, 39(12): 151-159. DOI: 10.11959/j.issn.1000−436x.2018281.
目前大部分中文垃圾邮件过滤系统受文本稀疏及模型特征局限的影响较大,其特征高维和特征局限的缺陷成为制约过滤效果的重要因素。针对特征高维问题,提出一种基于中心词扩展的TF-IDF(term frequency-inverse document frequency)特征提取算法,增加了特征节点的表达能力,实现了特征降维。针对分类模型特征局限和属性间条件独立性假设不成立问题,提出一种基于GWO_GA(grey wolf optimizer-genetic algorithm)结构学习算法的3层贝叶斯网络模型,放松了条件独立性假设,增加了特征多样性,最终形成基于中心词扩展的 TF-IDF 特征提取及GWO_GA结构学习的3层贝叶斯算法。通过大量中文邮件数据验证,算法可明显提高中文垃圾邮件过滤效果。
In view of the shortcoming that high dimension of features in the Chinese spam filtering system
a TF-IDF features extraction algorithm was proposed based on the central word extension
the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further
a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.
SAHAMI M . A Bayesian approach to filtering junk email [C ] // Proc. AAAI Workshop on Learning for Text Categorization . 1998 .
ANDROUTSOPOULOS I , KOUTSIAS J , CHANDRINOS K V , et al . An evaluation of naive Bayesian anti-spam filtering [C ] // The 11th European Conference on Machine Learning . 2000 : 9 - 17 .
DRUCKER H , WU D , VAPNIK V N . Support vector machines for spam categorization [J ] . IEEE Transactions on Neural Networks , 2002 , 10 ( 5 ): 1048 - 1054 .
DELANY S J , BUCKLEY M , GREENE D . Review: SMS spam filtering: Methods and data [J ] . Expert Systems with Applications , 2012 , 39 ( 10 ): 9899 - 9908 .
PANIGRAHI P K . A comparative study of supervised machine learning techniques for spam e-mail filtering [C ] // Fourth International Conference on Computational Intelligence and Communication Networks . 2012 : 506 - 512 .
ROY S S , CHARABORTY S , SOURAV S , et al . Rough set theory approach for filtering spams from boundary messages in a chat system [C ] // International Conference on Intelligent Systems Design and Applications . 2014 : 28 - 34 .
WANG H , ZHENG G , HE Y . The improved bayesian algorithm to spam filtering [C ] // The 4th International Conference on Computer Engineering and Networks . 2015 : 37 - 44 .
ELSSIED N O F , IBRAHIM O , OSMAN A H . Enhancement of spam detection mechanism based on hybrid k-mean clustering and support vector machine [J ] . Soft Computing , 2015 , 19 ( 11 ): 3237 - 3248 .
HE H , TIWARI A , MEHNEN J , et al . Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection [C ] // Evolutionary Computation . 2016 .
SAIDANI N , ADI K , ALLILI M S . A supervised approach for spam detection using text-based Semantic representation [M ] // E-Technologies : Embracing the Internet of Things . 2017 .
杨雷 , 曹翠玲 , 孙建国 , 等 . 改进的朴素贝叶斯算法在垃圾邮件过滤中的研究 [J ] . 通信学报 , 2017 , 38 ( 4 ): 140 - 148 .
YANG L , CAO C L , SUN J G , et al . Study on an improved naïve Bayes algorithm in spam filtering [J ] . Journal on Communications , 2017 , 38 ( 4 ): 140 - 148 .
YOGATAMA D , DYER C , WANG L , et al . Generative and discriminative text classification with recurrent neural Networks [J ] . arXiv:1703.01898 , 2017 .
GUPTA H , JAMAL M S , MADISETTY S , et al . A framework for real-time spam detection in Twitter [C ] // International Conference on Communication Systems & Networks . 2018 : 380 - 383 .
ALI S S . Net library for SMS spam detection using machine learning:A cross platform solution [C ] // Applied Sciences and Technology . 2018 .
JAIN G , SHARMA M , AGARWAL B . Optimizing semantic LSTM for spam detection [J ] . International Journal of Information Technology , 2018 ( 3 ): 1 - 12 .
WU J , PAN S , ZHU X . Self-adaptive attribute weighting for naive Bayes classification [J ] . Expert Systems with Applications , 2015 , 42 ( 3 ): 1487 - 1502 .
ZHANG L , JIANG L , LI C . Two feature weighting approaches for naive Bayes text classifiers [J ] . Knowledge-Based Systems , 2016 , 100 ( C ): 137 - 144 .
SALTON G . A vector space model for automatic indexing [J ] . Communications of the ACM , 1975 , 18 ( 11 ): 613 - 620 .
AMAYRI O , BOUGUILA N . A study of spam filtering using support vector machines [J ] . Artificial Intelligence Review , 2010 , 34 ( 1 ): 73 - 108 .
刘宝宁 , 章卫国 , 李广文 , 等 . 一种改进遗传算法的贝叶斯网络结构学习 [J ] . 西北工业大学学报 , 2013 , 31 ( 5 ): 716 - 721 .
LIU B N , ZHANG W G , LI G W , et al . Bayesian network structure learning based on an improved genetic algorithm [J ] . Journal of Northwestern Polytechnical University , 2013 , 31 ( 5 ): 716 - 721 .
MIRJALILI S , MIRJALILI S M , LEWIS A . Grey wolf optimizer [J ] . Advances in Engineering Software , 2014 , 69 ( 3 ): 46 - 61 .
JI Z W , XIA Q B , MENG G M . A review of parameter learning methods in Bayesian network [C ] // Advanced Intelligent Computing Theories and Applications . 2015 : 3 - 12 .
KARSHENAS H , BIELZA C , SANTANA R . A review on evolutionary algorithms in Bayesian network learning and inference tasks [J ] . Information Sciences An International Journal , 2013 , 233 ( 2 ): 109 - 125 .
刘浩然 , 吕晓贺 , 李轩 , 等 . 基于Bayesian改进算法的回转窑故障诊断模型研究 [J ] . 仪器仪表学报 , 2015 , 36 ( 7 ): 1554 - 1561 .
LIU H R , LV X H , LI X , et al . A study on the fault diagnosis model of rotary kiln based on an improved algorithm of Bayesian [J ] . Chinese Journal of Scientific Instrument , 2015 , 36 ( 7 ): 1554 - 1561 .
TANG X , WAN Y , LIU Y , et al . Chinese spam classification based on weighted distributed characteristic [C ] // 2017 Chinese Automation Congress . 2017 : 6618 - 6622 .
HU W , DU J , XING Y . Spam filtering by semantics-based text classification [C ] // Eighth International Conference on Advanced Computational Intelligence . 2016 : 89 - 94 .
0
浏览量
2
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构