A multiple scale acoustic modeling framework for task-domain independent keyword spotting was proposed.A large-scale phoneme set was obtained automatically through decision-tree based phoneme clustering
and a large-scale phoneme acoustic context dependent background model was trained accordingly through using standard HMM training framework.The modeling accuracy for filler speech is improved.Under the framework
an efficient searching space construction through using shared HMM state was also described.Experimental results showed that in average absolute 6.9% improvement of keyword recognition accuracy could be obtained.Furthermore an acoustic context neighbor algorithm to measure acoustic confidence and a method of computing candidate keyword likelihood based on proposed multiple-scale acoustic model were proposed and a fusing method based on FLDA was adopted.The effectiveness of acoustic confidence measure is improved significantly.Experimental results showed that absolute 3.0% reduction of equal error rate could be obtained.