基于自动化特征组合的隐私保护风险识别机制

蔡民超; 姚宏伟; 王旸; 秦湛; 陈少梦; 任奎

doi:10.11959/j.issn.1000-436x.2024194

您当前的位置：

首页 >

文章列表页 >

基于自动化特征组合的隐私保护风险识别机制

学术论文 | 更新时间：2024-12-24

- 基于自动化特征组合的隐私保护风险识别机制
- Privacy protection risk identification mechanism based on automated feature combination
- 通信学报 2024年45卷第11期页码：1-14
- 作者机构：
  
  1.浙江大学网络空间安全学院，浙江杭州 310007
  2.杭州快迪科技有限公司，浙江杭州 310000
- 作者简介：
  
  [ "蔡民超（1991- ），男，山东烟台人，浙江大学博士生，主要研究方向为数据安全、隐私保护、反欺诈等。" ]
  [ "姚宏伟（1993- ），男，福建泉州人，浙江大学博士生，主要研究方向为可信人工智能、大模型安全等。" ]
  [ "王旸（1988- ），男，浙江嘉兴人，浙江大学博士生，主要研究方向为网络安全、数据安全等。" ]
  [ "秦湛（1988- ），男，北京人，博士，浙江大学研究员、博士生导师，主要研究方向为数据安全、隐私保护、AI安全等。" ]
  [ "陈少梦（1989- ），男，浙江宁波人，杭州快迪科技有限公司工程师，主要研究方向为数据安全、反作弊、反欺诈等。" ]
  [ "任奎（1978- ），男，安徽芜湖人，博士，浙江大学教授、博士生导师，主要研究方向为数据安全、人工智能安全等。" ]
- 基金信息：
  
  国家重点研发计划基金资助项目;The National Key Research and Development Program of China(2021YFB3100300);国家自然科学基金资助项目(U20A20178;62072395;62206207)
- DOI：10.11959/j.issn.1000-436x.2024194
  中图分类号： TP181
- 收稿日期：2024-05-31，
  
  修回日期：2024-09-30，
  
  纸质出版日期：2024-11-25
- 稿件说明：
移动端阅览
蔡民超,姚宏伟,王旸等.基于自动化特征组合的隐私保护风险识别机制[J].通信学报,2024,45(11):1-14.

CAI Minchao,YAO Hongwei,WANG Yang,et al.Privacy protection risk identification mechanism based on automated feature combination[J].Journal on Communications,2024,45(11):1-14.
蔡民超,姚宏伟,王旸等.基于自动化特征组合的隐私保护风险识别机制[J].通信学报,2024,45(11):1-14. DOI： 10.11959/j.issn.1000-436x.2024194.

CAI Minchao,YAO Hongwei,WANG Yang,et al.Privacy protection risk identification mechanism based on automated feature combination[J].Journal on Communications,2024,45(11):1-14. DOI： 10.11959/j.issn.1000-436x.2024194.

摘要

异常行为识别（AD）算法在实际应用中，通常会面临特征组合优化困难、分类器准确率难提高、模型应用效率低等技术挑战。用户所产生的多维数据具有丰富的空间结构信息，围绕这些多维数据的特点，在通过同态加密的隐私保护方式进行数据脱敏的基础上，针对特征组合优化困难的技术挑战，提出并实现了首个基于特征分箱的自动化特征组合优化模型算法，该算法在特征组合优化方面提升了99.93%的计算效率。基于自动化特征组合优化模型筛选出的重要特征所组合的规则仍存在分类器准确率难提高的技术挑战，故将自动化筛选出的重要特征融入识别模型中，设计并实现了首个规则和算法的交叉应用模型，并将该方式应用到基于用户多维信息的异常行为识别中，在识别先享不付类异常用户的具体场景中实现资金挽损效率提升27.78%。

Abstract

In practice

the anomaly detection (AD) algorithm usually faced technical challenges such as difficulty in optimizing feature combinations

difficulty in improving classifier accuracy

and low model application efficiency. The multidimensional data generated by users was with rich spatial structure information

revolved around the characteristics of the multidimensional data. Building upon the privacy protection method using homomorphic encryption

the technical challenge of optimizing feature combinations was addressed. The first automated feature combination optimization model algorithm based on feature binning was proposed and implemented. This algorithm enhanced computational efficiency in feature combination optimization by 99.93%. The rules combined by the important features selected by the automatic feature combination optimization model still faced the technical challenge of difficulty in improving the classifier accuracy. Therefore

the important features selected automatically were integrated into the recognition model

the first cross-application model of rules and algorithms was designed and implemented. This approach was applied to anomaly detection based on multi-dimensional user data

resulting in a 27.78% increase in funds saved in the specific scenario of identifying abnormal users who enjoy first but do not pay.

关键词

Keywords

references

PINZ A , ZISSERMAN A , WILDES R P , et al . What have we learned from deep representations for action recognition? [C ] // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2018 : 7844 - 7853 .

TRAN D , WANG H , TORRESANI L , et al . A closer look at spatiotemporal convolutions for action recognition [C ] // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2018 : 6450 - 6459 .

CHANDOLA V , BANERJEE A , KUMAR V . Anomaly detection: a survey [J ] . ACM Computing Surveys , 2009 , 41 ( 3 ): 1 - 58 .

AHMED M , MAHMOOD A N , HU J K . A survey of network anomaly detection techniques [J ] . Journal of Network and Computer Applications , 2016 , 60 : 19 - 31 .

ZHAO Y , NASRULLAH Z , LI Z . PyOD: a python toolbox for scalable outlier detection [J ] . arXiv Preprint , arXiv: 1901.01588 , 2019 .

LIU F T , TING K M , ZHOU Z H . Isolation forest [C ] // Proceedings of the 2008 Eighth IEEE International Conference on Data Mining . Piscataway : IEEE Press , 2008 : 413 - 422 .

ZIMEK A , SCHUBERT E , KRIEGEL H P . A survey on unsupervised outlier detection in high-dimensional numerical data [J ] . Statistical Analysis and Data Mining: The ASA Data Science Journal , 2012 , 5 ( 5 ): 363 - 387 .

KAUR R , SINGH S . A survey of data mining and social network analysis based anomaly detection techniques [J ] . Egyptian Informatics Journal , 2016 , 17 ( 2 ): 199 - 216 .

DOOSTARI M , ZEINALI R , LASHKARI H , et al . Anomaly detection in cliques of online social networks using fuzzy node-fuzzy graph [J ] . Journal of Basic and Applied Scientific Research , 2013 , 3 ( 8 ): 614 - 626 .

LIU Y X , LI Z , PAN S R , et al . Anomaly detection on attributed networks via contrastive self-supervised learning [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2022 , 33 ( 6 ): 2378 - 2392 .

CHEN P , LIU H Y , XIN R Y , et al . Effectively detecting operational anomalies in large-scale IoT data infrastructures by using a GAN-based predictive model [J ] . The Computer Journal , 2022 , 65 ( 11 ): 2909 - 2925 .

WU K , ZHU L , SHI W H , et al . Self-attention memory-augmented wavelet-CNN for anomaly detection [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2023 , 33 ( 3 ): 1374 - 1385 .

LI G , JUNG J J . Deep learning for anomaly detection in multivariate time series: approaches, applications, and challenges [J ] . Information Fusion , 2023 , 91 : 93 - 102 .

KE G L , MENG Q , FINLEY T , et al . LightGBM: a highly efficient gradient boosting decision tree [J ] . Advances in Neural Information Processing Systems , 2017 , 30 : 3149 - 3157 .

CHEN T Q , GUESTRIN C . XGBoost: a scalable tree boosting system [C ] // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM Press , 2016 : 785 - 794 .

杜德慧 , 程贝 , 刘静 . 面向安全攸关系统中小概率事件的统计模型检测 [J ] . 软件学报 , 2015 , 26 ( 2 ): 305 - 320 .

DU D H , CHENG B , LIU J . Statistical model checking for rare-event in safety-critical system [J ] . Journal of Software , 2015 , 26 ( 2 ): 305 - 320 .

姚潍 , 王娟 , 张胜利 . 基于决策树与朴素贝叶斯分类的入侵检测模型 [J ] . 计算机应用 , 2015 , 35 ( 10 ): 2883 - 2885 .

YAO W , WANG J , ZHANG S L . Intrusion detection model based on decision tree and naive-Bayes classification [J ] . Journal of Computer Applications , 2015 , 35 ( 10 ): 2883 - 2885 .

马江洪 , 张文修 , 徐宗本 . 数据挖掘与数据库知识发现: 统计学的观点 [J ] . 工程数学学报 , 2002 , 19 ( 1 ): 1 - 13 .

MA J H , ZHANG W X , XU Z B . Data mining and knowledge discovery in database: a statistical viewpoint [J ] . Chinese Journal of Engineering Mathematics , 2002 , 19 ( 1 ): 1 - 13 .

WAHAB O A , MOURAD A , OTROK H , et al . CEAP: SVM-based intelligent detection model for clustered vehicular ad hoc networks [J ] . Expert Systems with Applications , 2016 , 50 : 40 - 54 .

BIGDELI E , MOHAMMADI M , RAAHEMI B , et al . A fast and noise resilient cluster-based anomaly detection [J ] . Pattern Analysis and Applications , 2017 , 20 ( 1 ): 183 - 199 .

AL-TASHI Q , ABDULKADIR S J , RAIS H M , et al . Approaches to multi-objective feature selection: a systematic literature review [J ] . IEEE Access , 2020 , 8 : 125076 - 125096 .

LIU H , YU L . Toward integrating feature selection algorithms for classification and clustering [J ] . IEEE Transactions on Knowledge and Data Engineering , 2005 , 17 ( 4 ): 491 - 502 .

MUSA A B . Comparative study on classification performance between support vector machine and logistic regression [J ] . International Journal of Machine Learning and Cybernetics , 2013 , 4 ( 1 ): 13 - 24 .

MAALOUF M . Logistic regression in data analysis: an overview [J ] . International Journal of Data Analysis Techniques and Strategies , 2011 , 3 ( 3 ): 281 - 299 .

许冲 , 戴福初 , 徐素宁 , 等 . 基于逻辑回归模型的汶川地震滑坡危险性评价与检验 [J ] . 水文地质工程地质 , 2013 , 40 ( 3 ): 98 - 104 .

XU C , DAI F C , XU S N , et al . Application of logistic regression model on the Wenchuan earthquake triggered landslide hazard mapping and its validation [J ] . Hydrogeology & Engineering Geology , 2013 , 40 ( 3 ): 98 - 104 .

WEI D P , WANG T , WANG J . A logistic regression model for semantic web service matchmaking [J ] . Science China Information Sciences , 2012 , 55 ( 7 ): 1715 - 1720 .

ZHANG Z , LIU A , LYLES R H , et al . Logistic regression analysis of biomarker data subject to pooling and dichotomization [J ] . Statistics in Medicine , 2012 , 31 ( 22 ): 2473 - 2484 .

JUNEK W N , JONES L W , WOODS M T . Use of logistic regression for forecasting short-term volcanic activity [J ] . Algorithms , 2012 , 5 ( 3 ): 330 - 363 .

OHLSON J A . Financial ratios and the probabilistic prediction of bankruptcy [J ] . Journal of Accounting Research , 1980 , 18 ( 1 ): 109 - 131 .

DINH T H T , KLEIMEIER S . A credit scoring model for Vietnam’s retail banking market [J ] . International Review of Financial Analysis , 2007 , 16 ( 5 ): 471 - 495 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于同态密文转换的隐私保护卷积神经网络推理方案

同态明文-密文矩阵运算及其应用

理性安全的公平两方比较协议

面向Non-IID数据的拜占庭鲁棒联邦学习

基于同态加密的高效安全联邦学习聚合框架