Which ASR errors are hard to detect?

Abstract

In this paper, we focus on error detection in Automatic Speech Recognition (ASR) outputs. We present a new ap- proach using continuous word representation (word embed- dings) through a neural network classifier. This classifier is in charge to attribute a label (error or correct ) for each word within an ASR hypothesis. Combining with word embeddings, inputs are based on a set of features (ASR confidence scores, lexical, and syntactic features, including contextual information from each word). Experiments were conducted on the automatic transcrip- tions generated by the LIUM ASR system applied on the ETAPE corpus (French broadcast news). They show that the proposed neural architecture outperforms the state-of-the- art approach based on the use of Conditional Random Fields (CRF). Particularly in this study, we are interested in the analysis of the classifier outputs, in order to perceive the errors that are hard to detect. Results of this analysis are presented in this paper, providing useful information in order to improve the proposed ASR error detection system.

Publication
Workshop Errors by Humans and Machines in multimedia, multimodal and multilingual data processing (ERRARE 2015)