End-to-end named entity extraction from speech

Abstract

Named entity recognition (NER) is among SLU tasks that usu- ally extract semantic information from textual documents. Un- til now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propa- gation, metric to tune ASR systems sub-optimal in regards to the final task, reduced space search at the ASR output level,…) and it is known that more integrated approaches outperform se- quential ones, when they can be applied. In this paper, we present a first study of end-to-end approach that directly ex- tracts named entities from speech, though a unique neural ar- chitecture. On a such way, a joint optimization is able for both ASR and NER. Experiments are carried on French data eas- ily accessible, composed of data distributed in several evalua- tion campaign. Experimental results show that this end-to-end approach provides better results (F-measure=0.69 on test data) than a classical pipeline approach to detect named entity cate- gories (F-measure=0.65).

Publication
arXiv preprint arXiv:1805.12045