浏览全部资源
扫码关注微信
1. 吉林大学计算机科学与技术学院,吉林 长春 130012
2. 吉林大学软件学院,吉林 长春 130012
3. 吉林大学符号计算与知识工程教育部重点实验室,吉林 长春 130012
[ "李占山(1966- ),男,吉林长春人,博士,吉林大学教授、博士生导师,主要研究方向为约束优化与约束求解、机器学习、基于模型的诊断、智能规划与调度等。" ]
[ "刘兆赓(1993- ),男,吉林吉林人,吉林大学硕士生,主要研究方向为机器学习。" ]
网络出版日期:2019-10,
纸质出版日期:2019-10-25
移动端阅览
李占山, 刘兆赓. 基于XGBoost的特征选择算法[J]. 通信学报, 2019,40(10):101-108.
Zhanshan LI, Zhaogeng LIU. Feature selection algorithm based on XGBoost[J]. Journal on communications, 2019, 40(10): 101-108.
李占山, 刘兆赓. 基于XGBoost的特征选择算法[J]. 通信学报, 2019,40(10):101-108. DOI: 10.11959/j.issn.1000-436x.2019154.
Zhanshan LI, Zhaogeng LIU. Feature selection algorithm based on XGBoost[J]. Journal on communications, 2019, 40(10): 101-108. DOI: 10.11959/j.issn.1000-436x.2019154.
分类问题中的特征选择一直是一个重要而又困难的问题。这类问题中要求特征选择算法不仅能够帮助分类器提高分类准确率,同时还要尽可能地减少冗余特征。因此,为了在分类问题中更好地进行特征选择,提出了一种新型的包裹式特征选择算法XGBSFS。该算法借鉴极端梯度提升(XGBoost)算法中构建树的思想过程,通过从3个重要性度量的角度来衡量特征的重要性,避免单一重要性度量的局限性;然后通过改进的序列浮动前向搜索策略(ISFFS)搜索特征子集,使最终得到的特征子集有较高的质量。在8个UCI数据集的对比实验中表明,所提算法具有很好的性能。
Feature selection in classification has always been an important but difficult problem.This kind of problem requires that feature selection algorithms can not only help classifiers to improve the classification accuracy
but also reduce the redundant features as much as possible.Therefore
in order to solve feature selection in the classification problems better
a new wrapped feature selection algorithm XGBSFS was proposed.The thought process of building trees in XGBoost was used for reference
and the importance of features from three importance metrics was measured to avoid the limitation of single importance metric.Then the improved sequential floating forward selection (ISFFS) was applied to search the feature subset so that it had high quality.Compared with the experimental results of eight datasets in UCI
the proposed algorithm has good performance.
ZHOU T , LU H L , WANG W W , et al . GA-SVM based feature selection and parameter optimization in hospitalization expense modeling [J ] . Applied Soft Computing , 2019 ( 75 ): 323 - 332 .
LI J D , CHENG K W , WANG S H , et al . Feature selection:a data perspective [J ] . ACM Computing Surveys , 2017 , 50 ( 6 ): 1 - 45 .
周志华 . 机器学习 [M ] . 北京 : 清华大学出版社 , 2016 .
ZHOU Z H . Machine learning [M ] . Beijing : Tsinghua University PressPress , 2016 .
LIU H , YU L . Toward integrating feature selection algorithms for classification and clustering [J ] . IEEE Transactions on Knowledge and Data Engineering , 2005 , 17 ( 4 ): 491 - 502 .
ALMUALLIM H , DIETTERICH T G . Learning boolean concepts in the presence of many irrelevant features [J ] . Artificial Intelligence , 1994 , 69 ( 1-2 ): 279 - 305 .
KAMATH U , DE J K , SHEHU A . Effective automated feature construction and selection for classification of biological sequences [J ] . Plos One , 2014 , 9 ( 7 ):e99982
GUYON I , ELISSEEFF A . An introduction to variable and feature selection [J ] . Journal of Machine Learning Research , 2003 , 3 ( 6 ): 1157 - 1182 .
ZAKERI A , HOKMABADI A . Efficient feature selection method using real-valued grasshopper optimization algorithm [J ] . Expert Systems with Applications , 2019 ( 119 ): 61 - 72 .
XUE B , ZHANG M , BROWNE W N , et al . A survey on evolutionary computation approaches to feature selection [J ] . IEEE Transactions on Evolutionary Computation , 2016 , 20 ( 4 ): 606 - 626 .
GHAEMI M , FEIZI-DERAKHSHI M R . Feature selection using forest optimization algorithm [J ] . Pattern Recognition , 2016 ( 60 ): 121 - 129 .
ZHANG Y , SONG X , GONG D . A return-cost-based binary firefly algorithm for feature selection [J ] . Information Sciences , 2017 , 418-419 : 561 - 574 .
CHEN T , GUESTRIN C . XGBoost:a scalable tree boosting system [C ] // ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM , 2016 : 785 - 794 .
FRIEDMAN J , HASTIE T , TIBSHIRANI R . Special invited paper.additive logistic regression:a statistical view of boosting [J ] . The Annals of Statistics , 2000 , 28 ( 2 ): 337 - 374 .
LU Y , LIU L , LUAN S , et al . The diagnostic value of texture analysis in predicting WHO grades of meningiomas based on ADC maps:an attempt using decision tree and decision forest [J ] . European Radiology , 2019 , 29 ( 3 ): 1318 - 1328 .
PANG L , WANG J , ZHAO L , et al . A novel protein subcellular localization method with CNN-XGBoost model for alzheimer’s disease [J ] . Frontiers in Genetics , 2019 ( 9 ): 1 - 7 .
PAN B . Application of XGBoost algorithm in hourly PM2.5 concentration prediction [J ] . IOP Conference Series:Earth and Environmental Science , 2018 ( 113 ): 012 - 127 .
MACEDO F , ROSÁRIO OLIVEIRA M , PACHECO A , et al . Theoretical foundations of forward feature selection methods based on mutual information [J ] . Neurocomputing , 2019 ( 325 ): 67 - 89 .
VERGARA J R , ESTÉVEZ P A . A review of feature selection methods based on mutual information [J ] . Neural Computing and Applications , 2014 , 24 ( 1 ): 175 - 186 .
SHI F , YAO Y , BIN Y , et al . Computational identification of deleterious synonymous variants in human genomes using a feature-based approach [J ] . BMC Medical Genomics , 2019 , 12 ( 1 ): 81 - 88 .
XUE B , ZHANG M , BROWNE W N . Novel initialisation and updating mechanisms in PSO for feature selection in classification [J ] . Applications of Evolutionary Computation , 2013 ( 7835 ): 428 - 438 .
GHOSH A , DATTA A , GHOSH S . Self-adaptive differential evolution for feature selection in hyperspectral image data [J ] . Applied Soft Computing , 2013 , 13 ( 4 ): 1969 - 1977 .
DUA D , EFI K T . UCI machine learning repository [J ] . The UCI Machine Learning Repository , 2019 .
BROWN G , POCOCK A C , ZHAO M J , et al . Conditional likelihood maximisation:a unifying framework for information theoretic feature selection [J ] . Journal of Machine Learning Research , 2012 ( 13 ): 27 - 66 .
CADENAS J M , GARRIDO M C , MARTÍNEZ R . Feature subset selection filter–wrapper based on low quality data [J ] . Expert Systems with Applications , 2013 , 40 ( 16 ): 6241 - 6252 .
MAFARJA M M , MIRJALILI S . Hybrid whale optimization algorithm with simulated annealing for feature selection [J ] . Neurocomputing , 2017 ( 260 ): 302 - 312 .
EMARY E , ZAWBAA H M , HASSANIEN A E . Binary grey wolf optimization approaches for feature selection [J ] . Neurocomputing , 2016 ( 172 ): 371 - 381 .
0
浏览量
2383
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构