浏览全部资源
扫码关注微信
1.广州大学机械与电气工程学院,广东 广州 510006
2.广东工业大学物联网智能信息处理与系统集成教育部重点实验室,广东 广州 510006
[ "解元(1989- ),男,安徽利辛人,博士,广州大学讲师,主要研究方向为盲信号分离、信号处理和机器学习等。" ]
[ "邹涛(1975- ),男,辽宁沈阳人,博士,广州大学教授,主要研究方向为工业过程建模与仿真、模型预测控制、先进过程控制和实时优化技术研究与应用。" ]
[ "孙为军(1975- ),男,安徽马鞍山人,博士,广东工业大学副教授,主要研究方向为模式识别、机器学习等。" ]
[ "谢胜利(1956- ),男,湖北荆州人,博士,广东工业大学教授,主要研究方向为无线网络、自动控制、盲信号处理等。" ]
收稿日期:2024-07-04,
修回日期:2024-11-01,
纸质出版日期:2024-11-25
移动端阅览
解元,邹涛,孙为军等.基于混合混响模型的多通道语音增强算法[J].通信学报,2024,45(11):15-26.
XIE Yuan,ZOU Tao,SUN Weijun,et al.Multichannel speech enhancement algorithm based on hybrid reverberation model[J].Journal on Communications,2024,45(11):15-26.
解元,邹涛,孙为军等.基于混合混响模型的多通道语音增强算法[J].通信学报,2024,45(11):15-26. DOI: 10.11959/j.issn.1000-436x.2024197.
XIE Yuan,ZOU Tao,SUN Weijun,et al.Multichannel speech enhancement algorithm based on hybrid reverberation model[J].Journal on Communications,2024,45(11):15-26. DOI: 10.11959/j.issn.1000-436x.2024197.
为了解决带混响和噪声场景下的语音增强问题,构建了一个集成多通道线性预测模型和空间相干模型的语音增强模型,设计了一种基于混合混响模型的多通道语音增强算法。该算法将后期混响分为2个分量,分别用多通道线性预测模型和空间相干模型来建模,为优化模型参数,利用卡尔曼滤波器实施更新模型参数,并用多项式矩阵特征值分解进行空间、时间和频率解相关,实现去混响去噪声。实验结果表明,所提算法可以实现高低混响带噪声环境下的语音增强,相比于流行的语音增强算法,其增强效果更优越,其中语音质量客观评价(PESQ)值和短时客观可懂度(STOI)值最高分别提高了30%和20%。
To solve the speech enhancement problem in reverberation and noise scenarios
a new speech enhancement model was constructed integrating multichannel linear prediction model and spatial coherence model
and then a multichannel speech enhancement algorithm based on a hybrid reverberation model was designed. The post-reverberation was divided into two components
which were modeled using a multichannel linear prediction model and a spatial coherence model
respectively. To optimize the model parameters
a Kalman filter was used to update the model parameters and polynomial matrix eigenvalue decomposition was used for spatial
temporal
and frequency decorrelation to achieve reverberation and noise reduction. Experimental results show that the proposed algorithm can enhance speech in high and low-reverberation noise environments
and its enhancement effect is superior to popular speech enhancement algorithms
the performance indicators of speech enhancement
perceptual evaluation of speech quality score (PESQ) value and short-time objective intelligibility (STOI) value
have increased by 30% and 20%
respectively.
HOANG P , HAAN J M D , TAN Z H , et al . Multichannel speech enhancement with own voice-based interfering speech suppression for hearing assistive devices [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2022 , 30 : 706 - 720 .
OZTURK M Z , WU C S , WANG B B , et al . RadioSES: mmWave-based audioradio speech enhancement and separation system [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 1333 - 1347 .
张琳 , 王海涛 , 杨爽 , 等 . 面向舱室声学环境的深度时域语音增强网络 [J ] . 声学学报 , 2023 , 48 ( 4 ): 890 - 900 .
ZHANG L , WANG H T , YANG S , et al . Single-channel deep time-domain speech enhancement networks for cabin environments [J ] . Acta Acustica , 2023 , 48 ( 4 ): 890 - 900 .
EVERS C , NAYLOR P A . Acoustic SLAM [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2018 , 26 ( 9 ): 1484 - 1498 .
CHEN J D , BENESTY J , HUANG Y T , et al . New insights into the noise reduction Wiener filter [J ] . IEEE Transactions on Audio, Speech, and Language Processing , 2006 , 14 ( 4 ): 1218 - 1234 .
CHEN Z , WANG R , YIN F L , et al . Speech dereverberation method based on spectral subtraction and spectral line enhancement [J ] . Applied Acoustics , 2016 , 112 : 201 - 210 .
SAYOUD A , DJENDI M , MEDAHI S , et al . A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement [J ] . Applied Acoustics , 2018 , 135 : 101 - 110 .
SURENDRAN S , KUMAR T K . Oblique projection and cepstral subtraction in signal subspace speech enhancement for colored noise reduction [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2018 , 26 ( 12 ): 2328 - 2340 .
LUO Y , MESGARANI N . Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2019 , 27 ( 8 ): 1256 - 1266 .
范君怡 , 杨吉斌 , 张雄伟 , 等 . U-net网络中融合多头注意力机制的单通道语音增强 [J ] . 声学学报 , 2022 , 47 ( 6 ): 703 - 716 .
FAN J Y , YANG J B , ZHANG X W , et al . Monaural speech enhancement using U-net fused with multi-head self-attention [J ] . Acta Acustica , 2022 , 47 ( 6 ): 703 - 716 .
YOSHIOKA T , NAKATANI T , MIYOSHI M . Integrated speech enhancement method using noise suppression and dereverberation [J ] . IEEE Transactions on Audio, Speech, and Language Processing , 2009 , 17 ( 2 ): 231 - 246 .
DELCROIX M , YOSHIOKA T , OGAWA A , et al . Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the reverb challenge [C ] // REVERB Challenge Workshop 2014 .[ S.l. : s.n. ] , 2014 : 1 - 8 .
CHETUPALLI S R , SREENIVAS T V . Late reverberation cancellation using Bayesian estimation of multi-channel linear predictors and student’s t-source prior [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2019 , 27 ( 6 ): 1007 - 1018 .
YOSHIOKA T , NAKATANI T . Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening [J ] . IEEE Transactions on Audio, Speech, and Language Processing , 2012 , 20 ( 10 ): 2707 - 2720 .
TALMON R , COHEN I , GANNOT S . Multichannel speech enhancement using convolutive transfer function approximation in reverberant environments [C ] // Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE Press , 2009 : 3885 - 3888 .
SCHWARTZ O , GANNOT S , HABETS E A P . Multi-microphone speech dereverberation and noise reduction using relative early transfer functions [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2015 , 23 ( 2 ): 240 - 251 .
DIETZEN T , MOONEN M , WATERSCHOOT T V . Instantaneous PSD estimation for speech enhancement based on generalized principal components [C ] // Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO) . Piscataway : IEEE Press , 2021 : 191 - 195 .
DIETZEN T , DOCLO S , MOONEN M , et al . Integrated sidelobe cancellation and linear prediction Kalman filter for joint multi-microphone speech dereverberation, interfering speech cancellation, and noise reduction [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2020 , 28 : 740 - 754 .
BRAUN S , KUKLASINSKI A , SCHWARTZ O , et al . Evaluation and comparison of late reverberation power spectral density estimators [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2018 , 26 ( 6 ): 1056 - 1071 .
KODRASI I , DOCLO S . Analysis of eigenvalue decomposition-based late reverberation power spectral density estimation [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2018 , 26 ( 6 ): 1106 - 1118 .
NEO V W , EVERS C , NAYLOR P A . Enhancement of noisy reverberant speech using polynomial matrix eigenvalue decomposition [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2021 , 29 : 3255 - 3266 .
解元 , 邹涛 , 孙为军 , 等 . 面向高混响环境的欠定卷积盲源分离算法 [J ] . 通信学报 , 2023 , 44 ( 2 ): 82 - 93 .
XIE Y , ZOU T , SUN W J , et al . Algorithm of underdetermined convolutive blind source separation for high reverberation environment [J ] . Journal on Communications , 2023 , 44 ( 2 ): 82 - 93 .
解元 , 邹涛 , 孙为军 , 等 . 面向卷积混叠环境下的盲源分离新方法 [J ] . 自动化学报 , 2023 , 49 ( 5 ): 1062 - 1072 .
XIE Y , ZOU T , SUN W J , et al . Novel blind source separation method for convolutive mixed environment [J ] . Acta Automatica Sinica , 2023 , 49 ( 5 ): 1062 - 1072 .
SEKIGUCHI K , BANDO Y , NUGRAHA A A , et al . Autoregressive moving average jointly-diagonalizable spatial covariance analysis for joint source separation and dereverberation [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2022 , 30 : 2368 - 2382 .
LIU T Z , LU Z H , COSTA J P J D , et al . A hybrid reverberation model and its application to joint speech dereverberation and separation [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 3000 - 3014 .
UEDA T , NAKATANI T , IKESHITA R , et al . Blind and spatially-regularized online joint optimization of source separation, dereverberation, and noise reduction [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2024 , 32 : 1157 - 1172 .
解元 , 邹涛 , 余锦视 , 等 . 面向噪声和声学混响场景下的语音增强 [J ] . 信号处理 , 2024 , 40 ( 12 ): 2238 - 2248 .
XIE Y , ZOU T , YU J S , et al . Speech enhancement for noise and acoustic reverberation scenarios [J ] . Journal of Signal Processing , 2024 , 40 ( 12 ): 2238 - 2248 .
HADAD E , HEESE F , VARY P , et al . Multichannel audio database in various acoustic environments [C ] // Proceedings of the 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC) . Piscataway : IEEE Press , 2014 : 313 - 317 .
VICTOR Z , SENEFF S , GLASS J . TIMIT acoustic-phonetic continuous speech corpus [R ] . 1993 .
EATON J , GAUBITCH N D , MOORE A H , et al . Estimation of room acoustic parameters: the ACE challenge [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2016 , 24 ( 10 ): 1681 - 1693 .
VARGA A , STEENEKEN H J M . Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems [J ] . Speech Communication , 1993 , 12 ( 3 ): 247 - 251 .
RIX A W , BEERENDS J G , HOLLIER M P , et al . Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs [C ] // Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing . Piscataway : IEEE Press , 2001 : 749 - 752 .
TAAL C H , HENDRIKS R C , HEUSDENS R , et al . An algorithm for intelligibility prediction of time–frequency weighted noisy speech [J ] . IEEE/ACM Transactions on Audio , Speech, and Language Processing, 2011 , 19 ( 7 ): 2125 - 2136 .
0
浏览量
9
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构