Word embeddings combination and neural networks for robustness in ASR error detection

Abstract

This study focuses on error detection in Automatic Speech Recognition (ASR) output. We propose to build a confidence classifier based on a neural network architecture, which is in charge to attribute a label (error or correct) for each word within an ASR hypothesis. This classifier uses word embed- dings as inputs, in addition to ASR confidence-based, lexical and syntactic features. We propose to evaluate the impact of three different kinds of word embeddings on this error de- tection approach, and we present a solution to combine these three different types of word embeddings in order to take ad- vantage of their complementarity. In our experiments, different approaches are evaluated on the automatic transcriptions generated by two different ASR systems applied on the ETAPE corpus (French broadcast news). Experimental results show that the proposed neural architectures achieve a CER reduction comprised between 4% and 5.8% in error detection, depending on test dataset, in comparison with a state-of-the-art CRF approach.

Publication
European Signal Processing Conference (EUSIPCO 2015)