浏览全部资源
扫码关注微信
1. 中国科学技术大学信息科学技术学院,安徽 合肥 230027
2. 中国科学技术大学语音及语言信息处理国家工程实验室,安徽 合肥 230027
3. 数学工程与先进计算国家重点实验室,江苏 无锡 214125
[ "叶中付(1959-),男,安徽桐城人,博士,中国科学技术大学教授、博士生导师,主要研究方向为语音信号处理、阵列信号处理、雷达信号处理和图像分析与处理。" ]
[ "戚婷(1993-),女,安徽淮南人,中国科学技术大学硕士生,主要研究方向为语种识别。" ]
[ "李赛峰(1980-),男,江西萍乡人,中国科学技术大学博士生,主要研究方向为通信信号处理和语音信号处理。" ]
[ "宋彦(1972-),男,安徽合肥人,博士,中国科学技术大学副教授,主要研究方向为语种识别和基于内容的音/视频分析与检索。" ]
网络出版日期:2017-04,
纸质出版日期:2017-04-25
移动端阅览
叶中付, 戚婷, 李赛峰, 等. 基于LDOF准则的自适应高斯后端语种识别方法[J]. 通信学报, 2017,38(4):17-24.
Zhong-fu YE, Ting QI, Sai-feng LI, et al. Adaptive Gaussian back-end based on LDOF criterion for language recognition[J]. Journal on communications, 2017, 38(4): 17-24.
叶中付, 戚婷, 李赛峰, 等. 基于LDOF准则的自适应高斯后端语种识别方法[J]. 通信学报, 2017,38(4):17-24. DOI: 10.11959/j.issn.1000-436x.2017096.
Zhong-fu YE, Ting QI, Sai-feng LI, et al. Adaptive Gaussian back-end based on LDOF criterion for language recognition[J]. Journal on communications, 2017, 38(4): 17-24. DOI: 10.11959/j.issn.1000-436x.2017096.
针对由语种类内多样性引起的测试样本和训练模型不匹配的问题,提出一种基于局部距离离群因子准则(LDOF
local distance-based outlier factor)的自适应高斯后端语种识别方法。定义LDOF准则,实现有效的参数寻优过程并动态地在多类语种训练集上挑选出与测试样本特性相近的训练样本,调整原高斯后端,进而得到改进的语种识别方法。在NIST LRE 2009的6个易混淆语种任务集上的实验结果表明,所提方法的等错误概率(EER
equal error rate)和平均检测代价有显著提升。
In order to alleviate the mismatch in model between training and testing samples caused by inter-language variations
adaptive Gaussian back-end based on LDOF criterion was proposed for language recognition.The local distance-based outlier factor (LDOF) criterion was defined to find the appropriate model parameters and dynamically select the training data subset similar to the testing samples from multiple class training sets.Then original back-end was adjusted to obtain a more matched recognition model.Experimental results on NIST LRE 2009 easily-confused language data set show that proposed method achieves an obvious performance improvement on both the equal error rate (ERR) and average decision cost function.
蒋兵 . 语种识别深度学习方法研究 [D ] . 合肥:中国科学技术大学 , 2015 .
JIAN B . Deep learning based spoken language identification [D ] . Hefei:University of Science and Technology of China , 2015 .
DEHAK N , KENNY P , DEHAK R , et al . Front-end factor analysis for speaker verification [J ] . IEEE Transactions on Audio,Speech,and Language Processing , 2011 , 19 ( 4 ): 788 - 798 .
DEHAK N , TORRES-CARRASQUILLO P A , REYNOLDS D A , et al , et al . Language recognition via i-vectors and dimensionality reduction [C ] // The 12th Annual Conference of the International Speech Communication Association (Interspeech) . 2011 : 857 - 860 .
MARTINEZ D , PLCHOT O , BURGET L , et al . Language recognition in iVectors space [C ] // The Interspeech 2011,Conference of the International Speech Communication Association . 2011 : 861 - 864 .
PENAGARIKANO M , VARONA A , DIEZ M , et al . Study of different backends in a state-of-the-art language recognition system [C ] // Interspeech . 2012 : 2049 - 2052 .
杨绪魁 , 屈丹 , 张文林 . 正交拉普拉斯语种识别方法 [J ] . 自动化学报 , 2014 , 40 ( 8 ): 1812 - 1818 .
YANG X K , QU D , ZHANG W L . An orthogonal laplacian language recognition approach [J ] . Acta Automatica Sinica , 2014 , 40 ( 8 ): 1812 - 1818 .
LIU G , HASAN T , BORIL H , et al . An investigation on back-end for speaker recognition in multi-session enrollment [C ] // 2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),IEEE , 2013 : 7755 - 7759 .
VAN L D A , BRUMMER N . Channel-dependent GMM and multi-class logistic regression models for language recognition [C ] // 2006 IEEE Odyssey-The Speaker and Language Recognition Workshop.IEEE . 2006 : 1 - 8 .
BENZ M F , GAUVAIN J L , LAMEL L . Language score calibration using adapted Gaussian back-end [C ] // Interspeech 2009 . 2009 : 2191 - 2194 .
SENOUSSAOUI M , KENNY P,BRÜMMER N , et al . Mixture of PLDA models in i-vector space for gender-independent speaker recognition [C ] // Interspeech . 2011 : 25 - 28 .
KANAGASUNDARAM A , VOGT R J , DEAN D B , et al . PLDA based speaker recognition on short utterances [C ] // The Speaker and Language Recognition Workshop (Odyssey 2012) . ISCA , 2012 .
SARKAR A K , MATROUF D , BOUSQUET P M , et al . Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification [C ] // Interspeech . 2012 : 2662 - 2665 .
WANG M G , SONG Y , JIANG B , et al . Exemplar based language recognition method for short-duration speech segments [C ] // 2013 IEEE International Conference on Acoustics,Speech and Signal Processing . IEEE , 2013 : 7354 - 7358 .
SONG Y , HONG X , JIANG B , et al . Deep bottleneck network based i-vector representation for language identification [C ] . Interspeech 2015 . 2015 : 398 - 402 .
洪新海 , 宋彦 , 蒋兵 , 等 . 采用 DBN 的 TV 改进方法在语种识别中的应用 [J ] . 信号处理 , 2015 , 31 ( 9 ): 1152 - 1158 .
HONG X H , SONG Y , JIANG B , et al . Improved total variability modeling method using deep bottleneck network for language identification [J ] . Journal of Signal Processing , 2015 , 31 ( 9 ): 1152 - 1158 .
王梦鸽 . 短时语种识别若干问题研究 [D ] . 合肥:中国科学技术大学 , 2014 .
WANG M G . Research on problems in spoken language identification with short-duration segments [D ] . Hefei:University of Science and Technology of China , 2014 .
ZHANG K , HUTTER M , JIN H . A new local distance-based outlier detection approach for scattered real-world data [M ] // Advances in Knowledge Discovery and Data Mining . Springer Berlin Heidelberg , 2009 : 813 - 822 .
BISWAS S , ROHDIN J , SHINODA K . I-vector selection for effective PLDA modeling in speaker recognition [C ] // Proceedings Odyssey 2014-The Speaker and Language Recognition Workshop . 2014 : 100 - 105 .
VAN DER M L , HINTON G . Visualizing data using t-SNE [J ] . Journal of Machine Learning Research , 2008 , 9 ( 2605 ): 2579 - 2605 .
MARTIN A F , PRZYBOCKI M A . NIST 2003 language recognition evaluation [C ] // Interspeech . 2003 .
MARTIN A F , GREENBERG C S . The 2009 NIST language recognition evaluation [C ] // Odyssey . 2010 :30.
0
浏览量
717
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构