In this paper, we propose the use of data-drivenprobabilistic utterance-level decision logic to improveWeighted Finite State Transducer (WFST)-basedendpoint detection. In general, endpoint detection is dealtwith using two cascaded decision processes. The firstprocess is frame-level speech/non-speech classificationbased on statistical hypothesis testing, and the secondprocess is a heuristic-knowledge-based utterance-levelspeech boundary decision. To handle these two processeswithin a unified framework, we propose a WFST-basedapproach. However, a WFST-based approach has thesame limitations as conventional approaches in that theutterance-level decision is based on heuristic knowledgeand the decision parameters are tuned sequentially.
Therefore, to obtain decision knowledge from a speechcorpus and optimize the parameters at the same time, wepropose the use of data-driven probabilistic utteranceleveldecision logic. The proposed method reduces theaverage detection failure rate by about 14% for variousnoisy-speech corpora collected for an endpoint detectionevaluation.