基于改进CFCC特征提取的语种识别算法研究

龙华; 黄张衡; 邵玉斌; 杜庆治; 苏树盟

doi:10.11959/j.issn.1000-436x.2022234

您当前的位置：

首页 >

文章列表页 >

基于改进CFCC特征提取的语种识别算法研究

学术通信 | 更新时间：2024-06-05

- 基于改进CFCC特征提取的语种识别算法研究
- Research on language recognition algorithm based on improved CFCC feature extraction
- 通信学报 2022年43卷第12期页码：211-221
- 作者机构：
  
  昆明理工大学信息工程与自动化学院，云南昆明 650500
- 作者简介：
  
  [ "龙华（1963- ），女，回族，云南大理人，博士，昆明理工大学教授，主要研究方向为无线网络及音频信号处理、语种识别等" ]
  [ "黄张衡（1997- ），男，彝族，云南曲靖人，昆明理工大学硕士生，主要研究方向为音频信号处理、语种识别等" ]
  [ "邵玉斌（1970- ），男，云南曲靖人，昆明理工大学教授，主要研究方向为移动通信和个人通信系统以及信号处理" ]
  [ "杜庆治（1977- ），男，云南楚雄人，昆明理工大学副教授，主要研究方向为语音信号处理、语种识别" ]
  [ "苏树盟（1996- ），男，云南保山人，昆明理工大学硕士生，主要研究方向为音频信号处理、语音识别" ]
- 基金信息：
  
  国家自然科学基金资助项目(61761025)
- DOI：10.11959/j.issn.1000-436x.2022234
  中图分类号： TN912.34
- 网络出版日期：2022-12，
  
  纸质出版日期：2022-12-25
- 稿件说明：
移动端阅览
龙华, 黄张衡, 邵玉斌, 等. 基于改进CFCC特征提取的语种识别算法研究[J]. 通信学报, 2022,43(12):211-221.

Hua LONG, Zhangheng HUANG, Yubin SHAO, et al. Research on language recognition algorithm based on improved CFCC feature extraction[J]. Journal on communications, 2022, 43(12): 211-221.
龙华, 黄张衡, 邵玉斌, 等. 基于改进CFCC特征提取的语种识别算法研究[J]. 通信学报, 2022,43(12):211-221. DOI： 10.11959/j.issn.1000-436x.2022234.

Hua LONG, Zhangheng HUANG, Yubin SHAO, et al. Research on language recognition algorithm based on improved CFCC feature extraction[J]. Journal on communications, 2022, 43(12): 211-221. DOI： 10.11959/j.issn.1000-436x.2022234.

摘要

针对在低信噪比下语种识别准确率低的问题，提出一种基于分数阶小波变换的语种识别算法。首先，在特征提取前端采用自适应滤波法对带噪信号进行噪声滤除，以减小噪声对特征提取的影响，提升系统对带噪信号的处理能力。其次，采用新型分数阶小波变换作为小波基函数来模拟信号在耳蜗基底膜上的传播过程，利用非线性幂函数对信号进行压缩处理。最后，通过模拟人耳听觉过程提取改进耳蜗滤波器倒谱系数（CFCC）。实验结果表明，改进CFCC与传统CFCC相比显著提升了语种识别准确率，在0 dB信噪比下语种识别准确率平均提升了11.1%，充分验证了所提算法的有效性和稳健性。

Abstract

Aiming at the problem of low language recognition rate under low signal-to-noise ratio

a language recognition method based on fractional wavelet transform was proposed.Firstly

the adaptive filtering algorithm was used to filter the noise of the noisy signal

so as to reduce the influence of noise on the feature extraction and improve the processing ability of the system for non-stationary signals.Secondly

the motion of the signal on the basilar membrane of the cochlea was simulated

and then the signal was compressed by a nonlinear power function.Finally

the improved CFCC were extracted by simulating the human hearing process.Experiments show that compared with the traditional CFCC

the language recognition rate is significantly improved

and the language recognition rate is increased by 11.1% on average under the 0 dB signal-to-noise ratio

which verifies the effectiveness and robustness of the proposed algorithm.

关键词

Keywords

references

IRTZA S , SETHU V , AMBIKAIRAJAH E , et al . Using language cluster models in hierarchical language identification [J ] . Speech Communication , 2018 , 100 : 30 - 40 .

苗晓晓 , 徐及 , 王剑 . 基于降噪自动编码器的语种特征补偿方法 [J ] . 计算机研究与发展 , 2019 , 56 ( 5 ): 1082 - 1091 .

MIAO X X , XU J , WANG J . Denoising auto encoder-based language feature compensation [J ] . Journal of Computer Research and Development , 2019 , 56 ( 5 ): 1082 - 1091 .

DAVIS S , MERMELSTEIN P . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences [J ] . IEEE Transactions on Acoustics,Speech,and Signal Processing , 1980 , 28 ( 4 ): 357 - 366 .

龙华 , 杨明亮 , 邵玉斌 . 基于特征流融合的带噪语音检测算法 [J ] . 通信学报 , 2020 , 41 ( 4 ): 134 - 142 .

LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion [J ] . Journal on Communications , 2020 , 41 ( 4 ): 134 - 142 .

QI J , WANG D , JIANG Y , et al . Auditory features based on Gammatone filters for robust speech recognition [C ] // Proceedings of 2013 IEEE International Symposium on Circuits and Systems . Piscataway:IEEE Press , 2013 : 305 - 308 .

LI Q , HUANG Y . Robust speaker identification using an auditory-based feature [C ] // Proceedings of 2010 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2010 : 4514 - 4517 .

LI Q , HUANG Y . An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions [J ] . IEEE Transactions on Audio,Speech,and Language Processing , 2011 , 19 ( 6 ): 1791 - 1801 .

刘影 , 韩康康 , 钱志鸿 . 基于声音空间梯度的高稳健性击键识别方法 [J ] . 通信学报 , 2020 , 41 ( 5 ): 96 - 103 .

LIU Y , HAN K K , QIAN Z H . High-roubustness keystroke recognition method based on acoustic spatial gradient [J ] . Journal on Communications , 2020 , 41 ( 5 ): 96 - 103 .

李晶皎 , 安冬 , 杨丹 , 等 . 噪声环境下说话人识别的TEO-CFCC特征参数提取方法 [J ] . 计算机科学 , 2012 , 39 ( 12 ): 195 - 197 .

LI J J , AN D , YANG D , et al . TEO-CFCC characteristic parameter extraction method for speaker recognition in noisy environments [J ] . Computer Science , 2012 , 39 ( 12 ): 195 - 197 .

李作强 , 高勇 . 基于CFCC和相位信息的鲁棒性说话人辨识 [J ] . 计算机工程与应用 , 2015 , 51 ( 17 ): 228 - 232 .

LI Z Q , GAO Y . Robust speaker identification based on CFCC and phase information [J ] . Computer Engineering and Applications , 2015 , 51 ( 17 ): 228 - 232 .

PATEL T B , PATIL H A . Cochlear filter and instantaneous frequency based features for spoofed speech detection [J ] . IEEE Journal of Selected Topics in Signal Processing , 2017 , 11 ( 4 ): 618 - 631 .

白静 , 史燕燕 , 薛珮芸 , 等 . 融合非线性幂函数和谱减法的 CFCC特征提取 [J ] . 西安电子科技大学学报 , 2019 , 46 ( 1 ): 86 - 92 .

BAI J , SHI Y Y , XUE P Y , et al . CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction [J ] . Journal of Xidian University , 2019 , 46 ( 1 ): 86 - 92 .

吴龙文 , 聂雨亭 , 张宇鹏 , 等 . 基于变分模态分解的自适应滤波降噪方法 [J ] . 电子学报 , 2021 , 49 ( 8 ): 1457 - 1465 .

WU L W , NIE Y T , ZHANG Y P , et al . An adaptive filtering denoising method based on variational mode decomposition [J ] . Acta Electronica Sinica , 2021 , 49 ( 8 ): 1457 - 1465 .

GUO Y,etal . Novel fractional wavelet transform:principles,MRA and application [J ] . Digital Signal Processing , 2021 ,110:102937.

IRINO T , PATTERSON R D . A dynamic compressive gammachirp auditory filterbank [J ] . IEEE Transactions on Audio,Speech,and Language Processing , 2006 , 14 ( 6 ): 2222 - 2232 .

SHAO Y , JIN Z Z , WANG D L , et al . An auditory-based feature for robust speech recognition [C ] // Proceedings of 2009 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2009 : 4625 - 4628 .

LV H , SHAN P F , SHI H F , et al . An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement [J ] . Signal,Image and Video Processing , 2022 , 16 ( 8 ): 2231 - 2237 .

史军 , 张乃通 , 刘晓萍 . 一种新型分数阶小波变换及其应用 [J ] . 中国科学:信息科学 , 2012 , 42 ( 2 ): 125 - 135 .

SHI J , ZHANG N T , LIU X P . A novel fractional wavelet transform and its applications [J ] . Scientia Sinica (Informationis) , 2012 , 42 ( 2 ): 125 - 135 .

ZHOU T Y , ZHAO Y , WU J . ResNeXt and Res2Net structures for speaker verification [C ] // Proceedings of 2021 IEEE Spoken Language Technology Workshop . Piscataway:IEEE Press , 2021 : 301 - 307 .

SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2:inverted residuals and linear bottlenecks [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 4510 - 4520 .

QIN Z Q , ZHANG P Y , WU F , et al . FcaNet:frequency channel attention networks [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2021 : 763 - 772 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7132 - 7141 .

陈宗阳 , 赵辉 , 吕永胜 , 等 . 基于改进 MobileNetV2 网络的涂层表面缺陷识别方法 [J ] . 哈尔滨工程大学学报 , 2022 , 43 ( 4 ): 572 - 579 .

CHEN Z Y , ZHAO H , LYU Y S , et al . A recognition method of coating surface defects based on the improved MobileNetV2 network [J ] . Journal of Harbin Engineering University , 2022 , 43 ( 4 ): 572 - 579 .

陈亮 , 邵玉斌 , 龙华 , 等 . 基于时域Gammatone滤波特征的广播语种识别 [J ] . 信号处理 , 2022 , 38 ( 3 ): 599 - 608 .

CHEN L , SHAO Y B , LONG H , et al . Language identification for broadcasting signal based on time-domain gammatone filtering features [J ] . Journal of Signal Processing , 2022 , 38 ( 3 ): 599 - 608 .

曾金芳 , 徐文涛 , 黄费贞 . 基于耳蜗倒谱系数的说话人识别 [J ] . 电子技术与软件工程 , 2020 , 5 : 85 - 86 .

ZENG JF , XU W T , HUANG F Z . Speaker recognition based on cochlear filter cepstral coefficients [J ] . Electronic Technology and Software Engineering , 2020 , 5 : 85 - 86 .

浏览量

185

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于阵列的神经网络水声通信信号多参数联合估计算法

基于神经网络的恶意DNS流量检测方法

基于可见光和射频融合的通信定位一体化系统

基于特征依赖图的源代码漏洞检测方法

基于Swin-Transformer的短波协议信号识别