浏览全部资源
扫码关注微信
1. 北京邮电大学网络空间安全学院,北京 100876
2. 国家信息中心信息与网络安全部,北京 100045
3. 哈尔滨工业大学(深圳)计算机科学与技术学院,广东 深圳 518055
4. 鹏城实验室网络部,广东 深圳 518066
[ "向夏雨(1991- ),男,湖南花垣人,北京邮电大学博士生,主要研究方向为隐私保护、医疗大数据分析" ]
[ "王佳慧(1983- ),女,山西大同人,国家信息中心博士生,主要研究方向为数据安全、云安全、云取证安全、大数据安全" ]
[ "王子睿(2000- ),女,辽宁大连人,哈尔滨工业大学(深圳)硕士生,主要研究方向为数据安全" ]
[ "段少明(1994- ),男,湖南邵阳人,哈尔滨工业大学(深圳)博士生,主要研究方向为数据安全和机器学习" ]
[ "潘鹤中(1991- ),男,辽宁本溪人,北京邮电大学博士生,主要研究方向为云安全、数据安全、密码学" ]
[ "庄荣飞(1992- ),男,福建泉州人,哈尔滨工业大学(深圳)博士生,主要研究方向为数据安全、机器学习安全、隐私保护" ]
[ "韩培义(1992- ),男,山西吕梁人,哈尔滨工业大学(深圳)助理研究员,主要研究方向为数据安全和隐私保护" ]
[ "刘川意(1982- ),男,四川乐山人,博士,哈尔滨工业大学(深圳)教授,主要研究方向为云计算与云安全、大规模存储系统、数据保护与数据安全" ]
网络出版日期:2022-03,
纸质出版日期:2022-03-25
移动端阅览
向夏雨, 王佳慧, 王子睿, 等. 基于生成对抗网络技术的医疗仿真数据生成方法[J]. 通信学报, 2022,43(3):211-224.
Xiayu XIANG, Jiahui WANG, Zirui WANG, et al. Generate medical synthetic data based on generative adversarial network[J]. Journal on communications, 2022, 43(3): 211-224.
向夏雨, 王佳慧, 王子睿, 等. 基于生成对抗网络技术的医疗仿真数据生成方法[J]. 通信学报, 2022,43(3):211-224. DOI: 10.11959/j.issn.1000-436x.2022057.
Xiayu XIANG, Jiahui WANG, Zirui WANG, et al. Generate medical synthetic data based on generative adversarial network[J]. Journal on communications, 2022, 43(3): 211-224. DOI: 10.11959/j.issn.1000-436x.2022057.
对结构化电子健康档案中行的概率分布进行建模并生成仿真数据非常困难,因为表格数据通常包含定类列,传统编码方式可能产生特征维数灾难的问题,从而使建模异常困难。针对这一问题,提出利用庞加莱球模型建模医疗分类特征的层级结构,并采用高斯耦合的生成对抗网络技术合成结构化的电子健康档案。实验表明,该方法生成的训练数据能够在保证隐私性的前提下,实现与原始数据仅相差2%的可用性差异。
Modeling the probability distribution of rows in structured electronic health records and generating realistic synthetic data is a non-trivial task.Tabular data usually contains discrete columns
and traditional encoding approaches may suffer from the curse of feature dimensionality.Poincaré Ball model was utilized to model the hierarchical structure of nominal variables and Gaussian copula-based generative adversarial network was employed to provide synthetic structured electronic health records.The generated training data are experimentally tested to achieve only 2% difference in utility from the original data yet ensure privacy.
ROCHER L , HENDRICKX J M , DE MONTJOYE Y A . Estimating the success of re-identifications in incomplete datasets using generative models [J ] . Nature Communications , 2019 , 10 ( 1 ): 1 - 9 .
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial nets [J ] . Advances in Neural Information Processing Systems , 2014 , 27 : 2672 - 2680 .
FAN J , LIU T Y , LI G L , et al . Relational data synthesis using generative adversarial networks:a design space exploration [J ] . arXiv Preprint,arXiv:2008.12763 , 2020 .
POTDAR K , TAHER S , CHINMAY D . A comparative study of categorical variable encoding techniques for neural network classifiers [J ] . International Journal of Computer Applications , 2017 , 175 ( 4 ): 7 - 9 .
RODRÍGUEZ P , BAUTISTA M A , GONZÀLEZ J , , et al . Beyond one-hot encoding:lower dimensional target embedding [J ] . Image and Vision Computing , 2018 , 75 : 21 - 31 .
ZHANG X , DOU D J , WU J . Learning conceptual-contextual embeddings for medical text [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 5 ): 9579 - 9586 .
BENGIO Y , COURVILLE A , VINCENT P . Representation learning:a review and new perspectives [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013 , 35 ( 8 ): 1798 - 1828 .
XU L , SKOULARIDOU M , CUESTA-INFANTE A , , et al . Modeling tabular data using conditional GAN [J ] . Advances in Neural Information Processing Systems , 2019 , 32 : 7335 - 7345 .
AGRAWAL R , SRIKANT R . Privacy-preserving data mining [C ] // Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2000 : 439 - 450 .
方滨兴 , 贾焰 , 李爱平 , 等 . 大数据隐私保护技术综述 [J ] . 大数据 , 2016 , 2 ( 1 ): 1 - 18 .
FANG B X , JIA Y , LI A P , et al . Privacy preservation in big data:a survey [J ] . Big Data Research , 2016 , 2 ( 1 ): 1 - 18 .
李凤华 , 李晖 , 贾焰 , 等 . 隐私计算研究范畴及发展趋势 [J ] . 通信学报 , 2016 , 37 ( 4 ): 1 - 11 .
LI F H , LI H , JIA Y , et al . Privacy computing:concept,connotation and its research trend [J ] . Journal on Communications , 2016 , 37 ( 4 ): 1 - 11 .
GARFINKEL S L . De-identification of personal information [R ] . National Institute of Standards and Technology , 2015 .
STRACK B , DESHAZO J P , GENNINGS C , et al . Impact of HbA1c measurement on hospital readmission rates:analysis of 70,000 clinical database patient records [J ] . BioMed Research International,2014 , 2014 :781670.
OSIA S A , SHAHIN SHAMSABADI A , SAJADMANESH S , et al . A hybrid deep learning architecture for privacy-preserving mobile analytics [J ] . IEEE Internet of Things Journal , 2020 , 7 ( 5 ): 4505 - 4518 .
XIAO T H , TSAI Y H , SOHN K , et al . Adversarial learning of privacy-preserving and task-oriented representations [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12434 - 12441 .
LIU S C , DU J Z , SHRIVASTAVA A , et al . Privacy adversarial network [J ] . Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies , 2019 , 3 ( 4 ): 1 - 18 .
LI A , DUAN Y X , YANG H R , et al . TIPRDC:task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations [C ] // Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining . New York:ACM Press , 2020 : 824 - 832 .
GUO C , BERKHAHN F . Entity embeddings of categorical variables [J ] . arXiv Preprint,arXiv:1604.06737 , 2016 .
SLEE V N . The international classification of diseases:ninth revision (ICD-9) [J ] . Annals of Internal Medicine , 1978 , 88 ( 3 ): 424 .
CHOI E , BAHADORI M T , SEARLES E , et al . Multi-layer representation learning for medical concepts [C ] // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York:ACM Press , 2016 : 1495 - 1504 .
WANG X , ZHANG Y D , SHI C . Hyperbolic heterogeneous information network embedding [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 : 5337 - 5344 .
NICKEL M , KIELA D . Poincare embeddings for learning hierarchical representations [J ] . arXiv Preprint,arXiv:1705.08039 , 2017 .
ARJOVSKY M , CHINTALA S , BOTTOU L . Wasserstein generative adversarial networks [C ] // Proceedings of International Conference on Machine Learning .[S.l.:s.n. ] , 2017 : 214 - 223 .
PATKI N . The synthetic data vault:generative modeling for relational databases [D ] . Cambridge:Massachusetts Institute of Technology , 2016 .
YALE A , DASH S , DUTTA R , et al . Privacy preserving synthetic health data [C ] // Proceedings of 2019 European Symposium on Artificial Neural Networks,Computational Intelligence and Machine Learning,[S.l.:S.n . 2019 : 2 - 10 .
WEIJS S V , NOOIJEN V R , NICK V D G . Kullback–Leibler divergence as a forecast skill score with classic reliability–resolution–uncertainty decomposition [J ] . Monthly Weather Review , 2010 , 138 ( 9 ): 3387 - 3399 .
WANG W , SUN Y , HALGAMUGE S . Improving MMD-GAN training with repulsive loss function [J ] . arXiv Preprint,arXiv:1812.09916 , 2018 .
邹福泰 , 谭越 , 王林 , 等 . 基于生成对抗网络的僵尸网络检测 [J ] . 通信学报 , 2021 , 42 ( 7 ): 95 - 106 .
ZOU F T , TAN Y , WANG L , et al . Botnet detection based on generative adversarial network [J ] . Journal on Communications , 2021 , 42 ( 7 ): 95 - 106 .
0
浏览量
657
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构