浏览全部资源
扫码关注微信
昆明理工大学信息工程与自动化学院,云南 昆明 650031
[ "龙华(1963- ),女,回族,云南大理人,博士,昆明理工大学教授,主要研究方向为无线网络及音频信号处理" ]
[ "苏树盟(1996- ),男,云南保山人,昆明理工大学硕士生,主要研究方向为音频信号处理、语音识别" ]
网络出版日期:2022-06,
纸质出版日期:2022-06-25
移动端阅览
龙华, 苏树盟. 高阶最优LPC根值筛选的共振峰估计算法研究[J]. 通信学报, 2022,43(6):235-245.
Hua LONG, Shumeng SU. Research on formant estimation algorithm for high order optimal LPC root value screening[J]. Journal on communications, 2022, 43(6): 235-245.
龙华, 苏树盟. 高阶最优LPC根值筛选的共振峰估计算法研究[J]. 通信学报, 2022,43(6):235-245. DOI: 10.11959/j.issn.1000-436x.2022113.
Hua LONG, Shumeng SU. Research on formant estimation algorithm for high order optimal LPC root value screening[J]. Journal on communications, 2022, 43(6): 235-245. DOI: 10.11959/j.issn.1000-436x.2022113.
目的:现有的线性预测(LP)共振峰估计算法存在伪根干扰与极点交互,精确定位共振峰十分困难。LP预测共振峰的低阶拟合从根本上限制了共振峰提取的精度,为解决LP共振峰检测误差较大的问题,针对LP提取共振峰中伪根难以去除及极点交互带来的频谱混叠,提出一种基于高阶最优LP系数根值筛选的共振峰估计算法,探究算法中不同阶次下语音数字共振模型约束的根判定阈值、最优LP根值分布、频谱包络中的共振峰峰值分布及共振峰估计误差。
方法:考虑增大LP阶次的取值,提高LP系统频谱对语音信号的拟合程度,在不同阶次下分析语音信号共振峰频率的计算精度,获取含有更高线性峰值拟合精度的线性系统根值。采用语音数字共振模型约束共振峰的根幅值范围,通过匹配阶次的根幅值来筛选线性系统根值的方法来减少伪根数,滤除非共振峰频率值对应的伪根,消除频谱混叠。结合功率加权来加重信号的主要频谱成分,修正语音频率幅值,增强语音信号谱峰与LPC谱峰能量的匹配性,拉大极点间距离,降低谐波产生干扰带来的预测误差,提高频谱峰值频率区分度。
结果:从算法结构可看出,先对语音信号做预处理,预加重削减低频信息以降低基频对共振峰检测的干扰,并且增强高频以增加高频谱线中第三共振峰的幅值区分度,端点检测隔离无话段对有话帧做数字共振模型约束下的高阶LP分析。模型包含三个主要技术对性能的提升:(1)在系统容限范围内,提升LP阶次提高共振峰预测精度。共振峰是频谱包络的峰值频率且对应LP多项式的零极点,9阶线性预测仅保留了语音信号LP响应幅度谱的基本形状,LP阶次提高至15阶时,LP增大了信号的拟合度,LP零极点数目更多且分布更加靠近单位圆,15阶LP补偿了9阶线性拟合带来的共振峰拟合精度的牺牲,共振峰提取精度提升2.5%。(2)采用数字共振根值约束下的阈值判定根,有效滤除基频谐波产生的低频伪根及共振峰谐波产生的伪根。LP多项式的零极点是共振峰峰值对应的复数根,从共振峰检测根值分布来看,数字共振根值约束下的高阶LP根阈值有效滤除声道谐波作用产生的伪根,准确定位出共振峰峰值对应的根在单位圆的位置。(3)对语音在频率上做功率加权,修正后的信号预测共振峰更加精确。功率加权后的信号频谱包络能量更集中,18阶时,共振峰峰值频率1363Hz对1359Hz的混叠干扰被消除。在算法稳健性及不同方法整体性能比较上,本文算法在9阶到22阶均可稳健提取共振峰,且模型算法提取共振峰在18阶时表现出最优的性能。
结论:本文对基于LPC共振峰检测的方法做出改进,研究提高线性预测阶次对提取共振峰的影响,针对提高线性预测阶次带来的多伪根以及多极点交互的问题,最小化语音数字共振模型约束共振峰提取误差。分析线性预测阶次与根幅值筛选阈值的关系,采用数字共振约束下的根幅值反馈的方式获取匹配高阶次的低误差率筛选阈值来去除伪根,并且结合功率加权突出频谱峰值的幅值,消除共振峰提取过程中的极点交互,实现精准有效的共振峰提取。
Objectives: The existing linear prediction (LP) formant estimation algorithms are difficult to locate formant precisely because of the pseudo root interference and interaction between poles.Because of the low order fitting formant of LP prediction
the accuracy of formant extraction is fundamentally limited.It is difficult to remove false roots and spectrum aliasing caused by pole interaction in the formant extraction of high-order LP.In order to solve the problem of large error of LP formant detection
a formant estimation algorithm based on high-order LP coefficient root value screening was proposed. The root determination threshold
optimal LP root value distribution
peak distribution of formant in spectral envelope and formant estimation error of speech digital resonance model constraints under different orders are investigated.
Methods: The value of LP order is increased to improve the fitting degree of LP system spectrum of speech signal.The calculation precision of formant frequency of speech signal is analyzed in different order
and the root value of linear system with higher linear peak fitting precision is obtained.A speech digital resonance model is used to constrain the root amplitude range of the formant
and the number of false roots is reduced by matching the root amplitude of the order to filter the root values of the linear system.Combined with power weighting
the main spectral components of the signal are weighted. So the amplitude of speech frequency is corrected
and the energy matching between the spectral peak of the speech signal and the spectral peak of LPC is enhanced
the distance between poles is extended
the prediction error caused by harmonic generation interference is reduced
and the peak frequency discrimination of spectrum is improved.
Results: As can be seen from the algorithm structure
the speech signal is preprocessed
in which the low frequency information is reweighted to reduce the interference of fundamental frequency to formant detection.And the high frequency information is enhanced to increase the amplitude distinction of the third formant in the high spectrum line. And the end detection is isolated to do the high-order LP analysis of the spoken frame under the constraint of digital resonance model. The model includes three main techniques which improving the performance:(1) Within the system tolerance range
LP order is increased
which can improve the formant prediction accuracy. The formant is the peak frequency of the spectral envelope
which corresponding to the zero-pole of the LP polynomial. The 9-order linear prediction only preserves the basic shape of LP response amplitude spectrum of speech signal.When the order of LP is increased to the 15
the fitting degree of the signal is increased
and the zero and pole of LP is dense and the distribution of LP is closer to the unit circle.The 15th order LP compensates for the sacrifice of formant fitting accuracy caused by the 9th order linear fitting
which improves the formant extraction accuracy by 2.5%. (2) Using the threshold value under the constraint of digital resonance root value to determine the complex roots
the low frequency false roots generated by fundamental frequency harmonics and the false roots generated by formant harmonics is effectively filtered.The zeroes-poles of the LP polynomial are the complex roots corresponding to the formant peaks.In the view of the distribution of formant detection root values
the high-order LP root threshold constrained by digital formant root values can effectively filter the false roots generated by harmonic action of sound channel. And accurately the location of the root corresponding to formant root values in the unit circle is accurately located. (3) The revised signal prediction formant is more accurate by reweighting the speech frequency power.The spectrum envelope energy is more concentrated after power weighting.At order 18
the aliasing interference caused by the peak frequency of the formant at 1363Hz to 1359Hz is eliminated. In terms of the robustness of the algorithm and the overall performance comparison of different methods
the proposed algorithm can extract the formant robustly from order 9 to 22
and the model algorithm shows the optimal performance when the formant is extracted from order 18.
Conclusions:The method of formant detection based on LPC is improved.The effect of improving the order of linear prediction on formant extraction was studied.Aiming at the problem of multiple pseudo-roots and multi-pole interaction caused by increasing the order of linear prediction
the error of formant extraction constrained by the speech-digital resonance model is minimized. The relationship between the order of linear prediction and the screening threshold of root amplitude was analyzed. To remove false roots
the root amplitude feedback method under digital resonance constraint was used to obtain the filtering threshold of matching high order and low error rate. Combined with the power weighting
amplitude of the peak of the prominent spectrum is strengthened
which eliminates the pole interaction in formant extraction
achieving accurate and effective formant extraction.
VANITHA L M , SUDHA S . Noise diminution and formant extraction on vowels for hearing aid users [J ] . Multimedia Tools and Applications , 2020 , 79 ( 5/6 ): 3729 - 3741 .
LIU Z T , REHMAN A , WU M , et al . Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence [J ] . Information Sciences , 2021 , 563 : 309 - 325 .
曹冲 , 解焱陆 , 张劲松 . 不同共振峰分布下元音对声调感知的影响 [J ] . 清华大学学报(自然科学版) , 2018 , 58 ( 4 ): 352 - 356 .
CAO C , XIE Y L , ZHANG J S . Influence on tone perception from vowels with different formant distributions [J ] . Journal of Tsinghua University (Science and Technology) , 2018 , 58 ( 4 ): 352 - 356 .
MCCANDLESS S . An algorithm for automatic formant extraction using linear prediction spectra [J ] . IEEE Transactions on Acoustics,Speech,and Signal Processing , 1974 , 22 ( 2 ): 135 - 141 .
黄海 , 陈祥献 . 基于 Hilbert-Huang 变换的语音信号共振峰频率估计 [J ] . 浙江大学学报(工学版) , 2006 , 40 ( 11 ): 1926 - 1930 .
HUANG H , CHENG X X . Speech formant frequency estimation based on Hilbert-Huang transform [J ] . Journal of Zhejiang University , 2006 , 40 ( 11 ): 1926 - 1930 .
DISSEN Y , GOLDBERGER J , KESHET J . Formant estimation and tracking:a deep learning approach [J ] . The Journal of the Acoustical Society of America , 2019 , 145 ( 2 ): 642 - 653 .
赵涛涛 , 杨鸿武 . 结合EMD和加权Mel倒谱的语音共振峰提取算法 [J ] . 计算机工程与应用 , 2015 , 51 ( 9 ): 207 - 212 .
ZHAO T T , YANG H W . Formant extraction algorithm of speech signal by combining EMD and WMCEP [J ] . Computer Engineering and Applications , 2015 , 51 ( 9 ): 207 - 212 .
RABINER L R , SCHAFER R W . 数字语音处理理论与应用 [M ] . 刘加,张卫强,何亮 ,译 北京 : 电子工业出版社 , 2011 .
RABINER L R , SCHAFER R W . Theory and applications of digital speech processing [M ] . Translated by LIU J,ZHANG W Q,HE L . Beijing : Publishing House of Electronics Industry , 2011 .
TREMAIN E T . The government standard linear predictive coding algorithm:LPC10 [J ] . Speech Technol , 1982 , 1 ( 1 ): 40 - 49 .
YAN Z Y , ZHAO H M . Formant estimation algorithm based on digital waveguide models [C ] // Proceedings of 2010 2nd International Conference on Information Engineering and Computer Science . Piscataway:IEEE Press , 2010 : 1 - 4 .
MESSAOUD Z B , HAMIDA A B . Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition [J ] . International Journal of Speech Technology , 2011 , 14 ( 4 ): 393 - 403 .
MAGI C , POHJALAINEN J , BÄCKSTRÖM T , , et al . Stabilised weighted linear prediction [J ] . Speech Communication , 2009 , 51 ( 5 ): 401 - 411 .
KERONEN S , POHJALAINEN J , ALKU P , et al . Noise robust feature extraction based on extended weighted linear prediction in LVCSR [C ] // Proceedings of the 12th Annual Conference of the International Speech Communication Association . Saarland:DBLP , 2011 : 1 - 5 .
FRÉIN R D . Power-weighted LPC formant estimation [J ] . IEEE Transactions on Circuits and Systems II:Express Briefs , 2021 , 68 ( 6 ): 2207 - 2211 .
SUDHARSHAN R , RAMALINGAM C S . A data-driven weighted LP method for formant estimation [C ] // Proceedings of 2020 IEEE 4th Conference on Information & Communication Technology . Piscataway:IEEE Press , 2020 : 1 - 6 .
XU L , LIU H J , ZHANG S L , et al . Speech feature extraction based on linear prediction residual [C ] // Proceedings of 2020 IEEE 5th International Conference on Signal and Image Processing . Piscataway:IEEE Press , 2020 : 768 - 772 .
DIGGLE P J , WHITTLE P . Prediction and regulation by linear least-square methods [J ] . Biometrics , 1984 , 40 ( 3 ): 871 - 877 .
YOKOTA K , ISHIKAWA S , KOBA Y , et al . Inverse analysis of vocal sound source using an analytical model of the vocal tract [J ] . Applied Acoustics , 2019 , 150 ( 7 ): 89 - 103 .
XU K N , HU W , WANG Y H . An improved singer’s formant extraction method based on LPC algorithm [C ] // Proceedings of 2017 10th International Congress on Image and Signal Processing,BioMedical Engineering and Informatics (CISP-BMEI) . Piscataway:IEEE Press , 2017 : 1 - 5 .
ZAPATA J L G , DÍAZ MARTÍN J C , VILDA P G . Fast formant estimation by complex analysis of LPC coefficients [C ] // Proceedings of 2004 12th European Signal Processing Conference . Piscataway:IEEE Press , 2004 : 737 - 740 .
ZHANG H J , YANG Y . Fundamental frequency adjustment and formant transition based emotional speech synthesis [C ] // Proceedings of 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery . Piscataway:IEEE Press , 2012 : 1797 - 1801 .
0
浏览量
102
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构