浏览全部资源
扫码关注微信
1. 哈尔滨理工大学计算机科学与技术学院,黑龙江 哈尔滨 150080
2. 哈尔滨理工大学计算机科学与技术博士后流动站,黑龙江 哈尔滨 150080
[ "陈晨(1990− ),女,黑龙江哈尔滨人,博士,哈尔滨理工大学讲师、硕士生导师,主要研究方向为语音信号处理、音频信息分析、说话人识别等" ]
[ "肜娅峰(1997− ),女,河南南阳人,哈尔滨理工大学硕士生,主要研究方向为说话人识别、语音信号处理等" ]
[ "季超群(1995− ),男,黑龙江绥化人,哈尔滨理工大学硕士生,主要研究方向为说话人识别、语音信号处理等" ]
[ "陈德运(1962− ),男,黑龙江哈尔滨人,博士,哈尔滨理工大学教授、博士生导师,主要研究方向为模式识别、机器学习等" ]
[ "何勇军(1980− ),男,四川南充人,博士,哈尔滨理工大学教授、博士生导师,主要研究方向为语音信号处理、图像处理等" ]
网络出版日期:2021-07,
纸质出版日期:2021-07-25
移动端阅览
陈晨, 肜娅峰, 季超群, 等. 基于深层信息散度最大化的说话人确认方法[J]. 通信学报, 2021,42(7):231-237.
Chen CHEN, Yafeng RONG, Chaoqun JI, et al. Speaker verification method based on deep information divergence maximization[J]. Journal on communications, 2021, 42(7): 231-237.
陈晨, 肜娅峰, 季超群, 等. 基于深层信息散度最大化的说话人确认方法[J]. 通信学报, 2021,42(7):231-237. DOI: 10.11959/j.issn.1000-436x.2021133.
Chen CHEN, Yafeng RONG, Chaoqun JI, et al. Speaker verification method based on deep information divergence maximization[J]. Journal on communications, 2021, 42(7): 231-237. DOI: 10.11959/j.issn.1000-436x.2021133.
针对说话人确认中无法准确捕获特征间非线性关系的问题,提出了一种基于深层信息散度最大化的目标函数表示方法。该方法能通过计算特征所在分布之间相似度,来对特征间的非线性关系进行隐性表示,并在最大化这种统计相关性的优化目标指导下,使深度神经网络向着同类数据更紧凑、异类数据更分散的方向优化,最终达到提升深层特征空间区分性的目标。实验结果表明,相对于其他深度学习方法,所提方法的相对等错误率(EER)最多降低了15.80%,显著提升了系统性能。
To solve the problem that the nonlinear relationship between speaker representations cannot be accurately captured in speaker verification
an objective function based on depth information divergence maximization was proposed.It could implicitly represent the nonlinear relationship between speaker representations by calculating the similarity between their distributions.Under the supervision of the optimization goal of maximizing the statistical correlation
the deep neural network was optimized towards the direction that the within-class data was more compact and the between-class data were far away from each other
and finally the discrimination of deep speaker representation space could be effectively improved.Experimental results show that compared with other deep learning methods
the relative EER of the proposed method is reduced by 15.80% at most
which significantly improves the system performance.
郑方 , 李蓝天 , 张慧 , 等 . 声纹识别技术及其应用现状 [J ] . 信息安全研究 , 2016 , 2 ( 1 ): 44 - 57 .
ZHENG F , LI L T , ZHANG H , et al . Overview of voiceprint recognition technology and applications [J ] . Journal of Information Security Research , 2016 , 2 ( 1 ): 44 - 57 .
张钹 , 朱军 , 苏航 . 迈向第三代人工智能 [J ] . 中国科学:信息科学 , 2020 , 50 ( 9 ): 1281 - 1302 .
ZHANG B , ZHU J , SU H . Toward the third generation of artificial intelligence [J ] . Scientia Sinica (Informationis) , 2020 , 50 ( 9 ): 1281 - 1302 .
DEHAK N , KENNY P J , DEHAK R , et al . Front-end factor analysis for speaker verification [J ] . IEEE Transactions on Audio,Speech,and Language Processing , 2011 , 19 ( 4 ): 788 - 798 .
VESTMAN V , KINNUNEN T . Supervector compression strategies to speed up I-vector system development [C ] // Odyssey 2018 The Speaker and Language Recognition Workshop .[S.n.:s.l. ] , 2018 : 357 - 364 .
MA J B , SETHU V , AMBIKAIRAJAH E , et al . Generalized variability model for speaker verification [J ] . IEEE Signal Processing Letters , 2018 , 25 ( 12 ): 1775 - 1779 .
CHEN C , HAN J Q . TDMF:task-driven multilevel framework for end-to-end speaker verification [C ] // 2020 IEEE International Confe rence on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2020 : 6809 - 6813 .
高荣春 , 韩纪庆 , 张磊 . 说话人识别中基于最大后验概率的通道补偿方法 [J ] . 通信学报 , 2009 , 30 ( 3 ): 99 - 103 .
GAO R C , HAN J Q , ZHANG L . Channel compensation of speaker identification based on maximum a posteriori [J ] . Journal on Communications , 2009 , 30 ( 3 ): 99 - 103 .
汪海彬 , 郭剑毅 , 毛存礼 , 等 . 基于通用背景-联合估计(UB-JE)的说话人识别方法 [J ] . 自动化学报 , 2018 , 44 ( 10 ): 1888 - 1895 .
WANG H B , GUO J Y , MAO C L , et al . Speaker recognition based on universal background-joint estimation(UB-JE) [J ] . Acta Automatica Sinica , 2018 , 44 ( 10 ): 1888 - 1895 .
VARIANI E , LEI X , MCDERMOTT E , et al . Deep neural networks for small footprint text-dependent speaker verification [C ] // 2014 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2014 : 4052 - 4056 .
SNYDER D , GARCIA-ROMERO D ,, POVEY D , et al . Deep neural network embeddings for text-independent speaker verification [C ] // Interspeech 2017 . Piscataway:IEEE Press , 2017 : 999 - 1003 .
SNYDER D , GARCIA-ROMERO D ,, SELL G , et al . X-vectors:robust DNN embeddings for speaker recognition [C ] // 2018 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2018 : 5329 - 5333 .
LECUN Y , BOSER B , DENKER J S , et al . Backpropagation applied to handwritten zip code recognition [J ] . Neural Computation , 1989 , 1 ( 4 ): 541 - 551 .
VILLALBA J , CHEN N , SNYDER D , et al . State-of-the-art speaker recognition for telephone and video speech [C ] // Proceeding of the Twenty Annual Conference of the International Speech Communication Association . Piscataway:IEEE Press , 2019 : 1488 - 1492 .
SNYDER D , GARCIA-ROMERO D ,, SELL G , et al . Speaker recognition for multi-speaker conversations using X-vectors [C ] // 2019 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2019 : 5796 - 5800 .
ZHANG R T , WEI J G , LU W H , et al . ARET:aggregated residual extended time-delay neural networks for speaker verification [C ] // Interspeech 2020 . Piscataway:IEEE Press , 2020 : 946 - 950 .
YU Y Q , LI W J . Densely connected time delay neural network for speaker verification [C ] // Interspeech 2020 . Piscataway:IEEE Press , 2020 : 921 - 925 .
NAGRANI A , CHUNG J S , ZISSERMAN A . VoxCeleb:a large-scale speaker identification dataset [C ] // Interspeech 2017 . Piscataway:IEEE Press , 2017 : 2616 - 2620 .
BHATTACHARYA G , ALAM M J , GUPTA V , et al . Deeply fused speaker embeddings for text-independent speaker verification [C ] // Interspeech 2018 . Piscataway:IEEE Press , 2018 : 3588 - 3592 .
ZHANG C L , KOISHIDA K , HANSEN J H L . Text-independent speaker verification based on triplet convolutional neural network embeddings [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2018 , 26 ( 9 ): 1633 - 1644 .
陈莹 , 陈湟康 . 基于多模态生成对抗网络和三元组损失的说话人识别 [J ] . 电子与信息学报 , 2020 , 42 ( 2 ): 379 - 385 .
CHEN Y , CHEN H K . Speaker recognition based on multimodal generative adversarial nets with triplet-loss [J ] . Journal of Electronics &Information Technology , 2020 , 42 ( 2 ): 379 - 385 .
HUANG Z L , WANG S , YU K . Angular softmax for short-duration text-independent speaker verification [C ] // Interspeech 2018 . Piscataway:IEEE Press , 2018 : 3623 - 3627 .
NOVOSELOV S , SHULIPA A , KREMNEV I , et al . On deep speaker embeddings for text-independent speaker recognition [C ] // Odyssey 2018 The Speaker and Language Recognition Workshop . Piscataway:IEEE Press , 2018 : 378 - 385 .
YU Y Q , FAN L , LI W J . Ensemble additive margin softmax for speaker verification [C ] // 2019 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2019 : 6046 - 6050 .
WEI Y H , DU J Z , LIU H . Angular margin centroid loss for text-independent speaker recognition [C ] // Interspeech 2020 . Piscataway:IEEE Press , 2020 : 3820 - 3824 .
KULLBACK S , LEIBLER R A . On information and sufficiency [J ] . The Annals of Mathematical Statistics , 1951 , 22 ( 1 ): 79 - 86 .
BELGHAZI M.I , BARATIN A , RAJESHWAR S , et al . Mutual information neural estimation [C ] // Proceeding of the Thirty-Fifth International Conference on Machine Learning . Piscataway:IEEE Press , 2018 : 531 - 540 .
REYNOLDS D A , QUATIERI T F , DUNN R B . Speaker verification using adapted Gaussian mixture models [J ] . Digital Signal Processing , 2000 , 10 ( 1/2/3 ): 19 - 41 .
龙华 , 杨明亮 , 邵玉斌 . 基于特征流融合的带噪语音检测算法 [J ] . 通信学报 , 2020 , 41 ( 4 ): 134 - 142 .
LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion [J ] . Journal on Communications , 2020 , 41 ( 4 ): 134 - 142 .
MAATEN L , HINTON G . Visualizing data using t-SNE [J ] . Journal of Machine Learning Research , 2008 , 9 ( 11 ): 2579 - 2605 .
0
浏览量
134
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构