Sahar Ghannay

Associate Professor at LIMSI, CNRS

Université Paris-Saclay


Sahar Ghannay is an associate professor at Université Paris-Saclay, in the CNRS, LIMSI research center, since September 2018.

She received a PhD in Computer Science from Le Mans University on Septembre 2017. Her thesis work is part of the ANR VERA (AdVanced ERror Analysis for speech recognition) project. During her PhD, she spent a few months as @ visiting researcher at Apple within the Siri Speech team.

As a postdoctoral researcher at LIUM, she worked on neural end-to-end systems for the detection of named entities, speech understanding, as part of the Chist-Era M2CR (Multimodal Multilingual Continuous Representation for Human Language Understanding) project.

Her main research interests are continuous representations learning and their application to natural language processing and speech recognition tasks. She is also interested in semantic textual similarity task and its application to dialog system.


  • Artificial Intelligence
  • Representation learning
  • Natural /Spoken language processing
  • Dialog system


  • PhD in Computer Science, 2017

    Le Mans university

  • MS in in Computer Science, 2013

    Le Mans university

  • BS in in Computer Science, 2011

    Le Mans university


My research interests are continuous representations and their application to natural language processing and speech recognition tasks like ASR error detection, natural/spoken language understanding, etc. I am also interested to sementic textual similarity task and its application to dialog system, in addition to end-to-end neural systems for speech understanding.


  • Juan Manuel Coria, PHD student, “Active representation learning”
  • Valentin Carpentier, PHD student, “A Study of transfer learning approaches for entity linking on new domain and generated synthetic data”


  • LIHLITH (Chist-Era), 2018/10/01-2020-12/31
  • MeerQat (ANR)
  • GEM (ANR)
  • M2CR (Chist-Era), 2015/06/01 - 2019/06/30
  • VERA (ANR), 2013 - 2015

Recent Publications

Quickly discover relevant content by filtering publications.

Error Analysis Applied to End-to-End Spoken Language Understanding

This paper presents a qualitative study of errors produced by an end-to-end spoken language understanding (SLU) system (speech signal to concepts) that reaches state of the art performance. Different studies are proposed to better understand the weaknesses of such systems: comparison to a classical pipeline SLU system, a study on the cause of concept deletions (the most frequent error), observation of a problem in the capability of the end-to-end SLU system to segment correctly concepts, analysis of the system behavior to process unseen concept/value pairs, analysis of the benefit of the curriculum-based transfer learning approach. Last, we proposed a way to compute embeddings of sub-sequences that seem to contain relevant information for future work.

What is best for spoken language understanding: small but task-dependant embeddings or huge but out-of-domain embeddings?

Word embeddings are shown to be a great asset for several Natural Language and Speech Processing tasks. While they are already evaluated on various NLP tasks, their evaluation on spoken or natural language understanding (SLU) is less studied. The goal of this study is two-fold: firstly, it focuses on semantic evaluation of common word embeddings approaches for SLU task; secondly, it investigates the use of two different data sets to train the embeddings: small and task-dependent corpus or huge and out-of-domain corpus. Experiments are carried out on 5 benchmark corpora (ATIS, SNIPS, SNIPS70, M2M, MEDIA), on which a relevance ranking was proposed in the literature. Interestingly, the performance of the embeddings is independent of the difficulty of the corpora. Moreover, the embeddings trained on huge and out-of-domain corpus yields to better results than the ones trained on small and task-dependent corpus.

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classification) and includes the congenerous cosine loss, the additive angular margin loss, and the center loss. The second family of loss functions focuses on the similarity between training samples and includes the contrastive loss and the triplet loss. We show that the additive angular margin loss function outperforms all other loss functions in the study, while learning more robust representations. Based on a combination of SincNet trainable features and the x-vector architecture, the network used in this paper brings us a step closer to a really-end-to-end speaker verification system, when combined with the additive angular margin loss, while still being competitive with the x-vector baseline. In the spirit of reproducible research, we also release open source Python code for reproducing our results, and share pretrained PyTorch models on torch. hub that can be used either directly or after fine-tuning.

A study of continuous space word and sentence representations applied to ASR error detection

This paper presents a study of continuous word representations applied to automatic detection of speech recognition errors. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. We explore the use of several types of word representations: simple and combined linguistic embeddings, and acoustic ones associated to prosodic features, extracted from the audio signal. To compensate certain phenomena highlighted by the analysis of the error average span, we propose to model the errors at the sentence level through the use of sentence embeddings. An approach to build continuous sentence representations dedicated to ASR error detection is also proposed and compared to the Doc2vec approach. Experiments are performed on automatic transcriptions generated by the LIUM ASR system applied to the French ETAPE corpus. They show that the combination of linguistic embeddings, acoustic embeddings, prosodic features, and sentence embeddings in addition to more classical features yields very competitive results. Particularly, these results show the complementarity of acoustic embeddings and prosodic information, and show that the proposed sentence embeddings dedicated to ASR error detection achieve better results than generic sentence embeddings.

Lifelong learning and task-oriented dialogue system: what does it mean?

The main objective of this paper is to propose a functional definition of lifelong learning systems adapted to the framework of task-oriented dialogue sys- tems. We mainly identified two aspects where a lifelong learning technology could be applied in such systems: to improve the natural language understanding mod- ule and to enrich the database used by the system. Given our definition, we present an example of how it could be implemented in an existing task-oriented dialogue system that is developed in the LIHLITH project.


Semantic representations

Master 2 course, Université Paris-Saclay, IUT Orsay, Computer Science Department

  • 2018-2020
  • MSc level (M2)
  • Introduction to distributed representations, recent approaches, evaluation approaches

Service side web programming

BSc level (L2) course, Université Paris-Saclay, IUT Orsay, Computer Science Department

  • 2018-2020
  • BSc level (L2)
  • PHP, MySQL queries, Object Oriented Programming, cookies, sessions

Service client web programming

BSc level (L2) course, Université Paris-Saclay, IUT Orsay, Computer Science Department

  • 2018-2020
  • BSc level (L2)
  • Javascript, advanced javascript concepts, Ajax, JQuery

Database programming and administration

BSc level (L2) course, Université Paris-Saclay, IUT Orsay, Computer Science Department

  • 2018-2020
  • BSc level (L1)
  • SQL queries, function, procedure, trigger, cursors, packages

Introduction to programming C/python

BSc level (L1) practical session PS, Le Mans University, Computer Science Department

  • 2014-2017
  • BSc level (L1)
  • Algorithmics, variables, loops, functions

Algorithmic and programming

BSc level (L2) PS, Le Mans University, Computer Science Department

  • 2014-2017
  • BSc level (L2)
  • Data structures (linked list, hashtable, tree), pointers, recursivity

Database analysis and design

BSc level (L2, L3 (Engineering school)) PS, Le Mans University, Computer Science Department

  • 2016-2018
  • BSc level (L2, L3 (Engineering school))
  • design of a relational database, implementation and operation of a database in SQL

Logic programming

BSc level (L2, L3) course, PS, Le Mans University, Computer Science Department

  • 2016-2018
  • BSc level (L2, L3)
  • Basis of Prolog, facts, rules, unification, recursion, lists



Associate Professor

LIMSI, CNRS, Université Paris-Saclay

Sep 2018 – Present Orsay, Paris


LIUM University of Le Mans

Oct 2017 – Aug 2018 Le MANS

Research Intern

Apple, Siri Speech

Apr 2017 – Sep 2017 Cupertino, California

PhD Student

LIUM University of Le Mans

Oct 2014 – Sep 2017 Le MANS

Research engineer

LIUM University of Le Mans

Apr 2014 – Sep 2014 Le MANS