Combining continous word representation and prosodic features for ASR error prediction

Abstract

Recent advances in continuous word representation have been successfully used in several natural language processing tasks. This pa- per focuses on error prediction in Automatic Speech Recognition (ASR) outputs and proposes to investigate the use of continuous word repre- sentation (word embeddings) within a neural network architecture. The main contribution of this paper is about word embeddings combi- nation: several combination approaches are proposed in order to take advantage of their complementarity. The use of prosodic features, in addition to classical syntactic ones, is evaluated. Experiments are made on automatic transcriptions generated by the LIUM ASR system applied on the ETAPE corpus. They show that the proposed neural architecture, using an effective continuous word rep- resentation combination and prosodic features as additional features, outperforms significantly state-of-the-art approach based on the use of Conditional Random Fields. Last, the proposed system produces a well calibrated confidence measure, evaluated in terms of Normalized Cross Entropy.

Publication
3rd International Conference on Statistical Language and Speech Processing (SLSP 2015)