Research activities
On this page
Current research interests
Computer-Aided Language Learning
Through a valorisation project with my colleague Claire Gardent (Gramex project, 2023-2025), I entered the field of Computer-Aided Language Learning. We benefitted from a financial support from CNRS Innovation to work on the design and implementation of a learning environment, which permits teachers to automatically generate grammar exercises from authentic texts depending on learning objectives and learners' proficiency. This environment built on the results of the METAL project (2016-2020) on auto-adaptative language learning environment.
Research questions underlying this work include: how to automatically assess authentic texts' accessibility ? How to extract reliable syntactic information from texts ?
Natural Language Generation
Automatic text generation (aka Natural Language Generation, NLG) can be described as the task of automatically building natural language texts which verbalise some given input meaning. Nowadays, NLG systems build on machine learning techniques for extracting meaningful statistical representations of language from large datasets. This comes with (at least) 2 main limitations. First, the efficiency of these systems heavily rely (among others) on the input data. How to cover low-resource languages in this context ? Secondly, while several shared tasks exist in this domain, it is hard to compare between these systems, as the main metrics used in the community (e.g. BLEU) are limited (e.g. in the way they deal with paraphrases). How to design evaluation techniques for NLG system which better align with human judgement ?
Computing Science Teaching
From early nineties until 2011, there were no Computer Science classes neither in primary nor in secondary education in France. In 2011, CS was re-introduced in high-school curricula as an optional topic. I then started to train high-school teachers to this topic. In 2014, I joined the Maison pour la science in Orléans to co-conduct workshops on unplugged computer science. I was puzzled by the prejudices colleagues may have about Computer Science, which largely impede their learning.
Within the PIAF EU Erasmus+ Project (2018-2021), I have worked with colleagues in Nancy (Marie Duflot-Kremer), Liège (Brigitte Denis), Luxembourg (Robert Reuter) and Saarbrücken (Armin Weinberger) on the definition and implementation of a new referential of competencies related to computational thinking, which could help to break the above-mentioned prejudices and bridge the gap between using computers and understanding them.
Since 2020, I am involved in the APIMU (Apprentissage de la Pensée Informatique de la Maternelle à l'Université / Teaching Computational Thinking from Kindergarten to University) special interest group whose activities include the organization of a series of workshops on computing science teaching.
Past research interests
Grammar engineering
Grammar engineering is the task of designing and implementing linguistically motivated electronic descriptions of natural language (so-called grammars). These grammars are expressed within well-defined theoretical frameworks, and offer a fine-grained description of natural language. While grammars were first used to describe syntax, that is to say, the relations between constituents in a sentence, they often go beyond syntax and include e.g. semantic information. Alas, to cover a significant part of (the syntax or semantics of) a given language, one need to define grammars having several thousands of rules. This is a tedious and error-prone task which calls for adequate tools. One way to deal with this issue is to us a high-level description of the target grammar, which is compiled into the actual electronic grammar. Such high-level description is often called a metagrammar.
I have worked on the definition and implementation of description languages for grammar engineering (see eXtensible Meta-Grammar 2, XMG2), that is formal languages which help linguists to describe various dimensions of language (syntax, semantics, morphology). We were also particularly interested in the application of these description languages to the actual description of natural languages including under-resourced languages such as Ikota.
Parsing
Parsing (aka syntactic analysis) is the task of computing a representation of the relations between words in a string. Parsing usually relies on a (implicit or explicit) formal description of language (grammar) and produces a tree structure (constituency tree or dependency structure, depending on the framework one is working with).
- Parsing with rewriting systems
During my PhD (2003-2007) under the supervision of Claire Gardent, I worked on the use of tree-adjoining grammars for semantic parsing (that is, to compute a logical semantic representation from a given input sentence). This led to the development of SemConst, a semantic parser based on the DyALog parsing engine.
During a post-doctoral visit to Laura Kallmeyer's group at the University of Tübingen in 2007-2008, I worked on a parsing architecture for mildly context-sensitive grammar (Linear Rewriting Context-Free Systems) using Range Concatenation Grammar as a pivot formalism. I took part to the design and implementation of the Tuebingen Linguistic Parsing Architecture (TuLiPA).
- Parsing with constraint-based systems
I took part to the development of a parsing prototype for Property Grammars, a constraint-based grammar formalism (PropertyGrammar parser). Property Grammars differ from generative formalisms in so far as they can describe the syntax of ungrammatical or partially grammatical utterances, thus providing a formal framework for grammaticality judgement.
Multiword Expressions
Multi-word expressions (MWEs) are sequences of words with some unpredictable properties, such as to count somebody in (to rely on somebody) or to take a haircut (to suffer from some loss). Processing such expressions is particularly difficult because of their highly heterogeneous behaviour at the lexical, syntactic and semantic level.
During my participation to the PARSEME COST Action led by Agata Savary, I have worked on the representation of these expressions in linguistic resources and their impact on symbolic parsing.
I also took part to the creation of the Phraseology and Mutliword Expressions series at Language Science Press, and was involved in the PARSEME-FR project funded by ANR (2015-2020, PI: Mathieu Constant).
Funded projects
Litterat'IA (2025-2026)
The Litterat'IA project funded by the MSH Lorraine aims at evaluating the uses of AI-based conversational agents by natural language learners. To do so, students learning French as a Foreign Language will be interviewed in order to collect representative data. These will then be used to create a benchmark of AI-based agent's efficiency.
Role: co-principal investigator (with Guillaume Nassau).
Budget: 25,000 €
Gramex (2023-2025)
The Gramex project funded by CNRS Innovation, aims at providing teachers with an environnement for the automatic generation of grammar exercises from authentic texts, and learners with a exerciser.
Role: co-principal investigator (with Claire Gardent).
Budget: 102,000 €
PIAF (2018-2021)
The PIAF Erasmus+ project aims at promoting computing science education within fundamental curricula by providing teachers with a referential of competencies together with actual learning activities and pedagogical scenarii which have been tested on learners.
Role: coordinator for Université de Lorraine.
Budget: 449,664 €
PARSEME-FR (2015-2020)
The PARSEME-FR ANR project is a follow-up of the PARSEME Action. It aims at improving the precision of syntactic parsing of French by enhancing the support of Multi-Word Expressions within existing tools. Among its contributions, one may cite the creation of high-quality annotated corpora.
Role: co-leader (with Eric de la Clergerie) of Work Package 4.
Budget: 732,000 €
PARSEME (2013-2017)
This CoST Action aims at increasing and enhancing the support of the European multilingual heritage. This general aim is addressed through improving linguistic precision and computational efficiency of Natural Language Processing applications.
Roles: member of the Steering Committee, scientific leader of Working Group 2.
Budget: 663,348 €
Supervised students
PhD students
- Senaid Popovic (2024 - ) - PhD in Computer Science at Université de Lorraine on automatic detection of malicious intent in written messages, co-supervision with Fabien Lauer, Maxime Meyer and Damien Riquet (HornetSecurity) Lille & Nancy.
- William Soto (2021 - 2025) - PhD in Computer Science at Université de Lorraine on data-to-text generation for under-resourced languages, co-supervision with Claire Gardent, Nancy.
- Anastasia Shimorina (2017 - 2021) - PhD in Computer Science at Université de Lorraine, thesis entitled Natural Language Generation: from Data Creation to Evaluation via Modelling, co-supervision with Claire Gardent, Nancy. Now research fellow at Orange Labs, Lannion, France.
- Cherifa Ben Khelil (2015 - 2019) - PhD in Computer Science in cotutelle at Université d'Orléans (France) and ENSI Tunis (Tunisia), thesis entitled Construction semi-automatique d'une grammaire d'arbres adjoints pour l'analyse syntaxico-sémantique de l'arabe / Semi-automatic construction of a Tree-adjoining grammar for syntactic-semantic analysis of Arabic, co-supervision with Denys Duchier and Chiraz Ben Othmane Zribi. Now associate professor at EFREI, France.
- Jakub Waszczuk (2013 - 2017) - PhD in Computer Science at Université de Tours, thesis entitled Leveraging MWEs in practical TAG parsing: towards the best of the two worlds, co-supervision with Agata Savary (Blois). Now software engineer at Scrive, Sweden.
- Simon Petitjean (2010 - 2014) - PhD in Computer Science at Université d'Orléans, thesis entitled Génération Modulaire de Grammaires Formelles / Modular Formal Grammar Generation, co-supervision with Denys Duchier. Now post-doc at Carl von Ossietzky University of Oldenburg, Germany.
Bachelor / Master students
- Duy Van Ngo (2023) - MSc in Natural Language Processing at Université de Lorraine, Nancy. Topic: automatic assessment of text readability.
- Karolin Boczon (2023) - MSc in Natural Language Processing at Université de Lorraine, Nancy. Topic: automatic annotation of discourse markers (financed by the CODIM project). Co-supervision with Jacques Jayez (LORIA & ENS Lyon).
- Valadis Mastoras (2021) - MSc in Natural Language Processing at Université de Lorraine, Nancy. Topic: semi-automatic generation of grammar exercises. Now NLP engineer at Centre for Research and Technology Hellas (CERTH), Greece.
- Mathilde Aguiar (2021) - 2nd year student at Polytech Grenoble Engineering School (~ BSc), Nancy. Topic: Python-flask based development of a User Interface for a language learning environment.
- William Soto (2020) - BSc in Natural Language Processing at Université de Lorraine, Nancy. Topic: language detection and topic modelling from tweets. Co-supervision with Emmanuel Schang (Université d'Orléans) and Claire Gardent (CNRS-LORIA).
- Laurine Jeannot (2018) - BSc in Natural Language Processing at Université de Lorraine, Nancy. Topic: Formal representation of multiword-expressions. Co-supervision with Claire Gardent.
- Simon Petitjean (2010) - MSc in Computer Science at Université d'Orléans. Topic: modular development of formal grammars. Co-supervision with Denys Duchier.
- Kilian Evang (2008) - BA in Computational Linguistics at Universität Tübingen. Topic: development of a plugin for the eclipse IDE for metagrammar design. Co-supervision with Timm Lichte and Laura Kallmeyer. Now research fellow at University of Düsseldorf, Germany.
- Johannes Dellert (2008) - BA in Computational Linguistics at Universität Tübingen. Topic: implementation of automata-based lexical selection within the TuLiPA parser. Co-supervision with Wolfgang Maier and Laura Kallmeyer. Now lecturer at University of Tübingen, Germany.
Undergraduate students
- Axel Dung (2024) - Undergraduate studies in Computer Science (2nd year, BUT Informatique) at Université de Lorraine, Nancy. Topic: design and implementation of software for managing laboratory offices through an interactive map interface.
- Lucas Maujean (2023) - Undergraduate studies in Multimedia (2nd year, BUT Multimedia et Métiers de l'Internet) at Université de Lorraine, St Dié des Vosges. Topic: design and implementation of an automatically generated static web site.
- Lucas Poirot (2023) - Undergraduate studies in Computer Science (2nd year, BUT Informatique) at Université de Lorraine, St Dié des Vosges. Topic: design and implementation of a back-end (API) for automatic generation of grammar execises.
- Hugo Collin (2023) - Undergraduate studies in Computer Science (2nd year, BUT Informatique) at Université de Lorraine, Nancy. Topic: design and implementation of a VueJS front-end for automatic generation of grammar exercises.
- Paul Claude (2020) - Undergraduate studies in Computer Science (2nd year, DUT Informatique) at Université de Lorraine, Nancy. Topic: design and implementation of an on-line multilingual document edition environment with Flask. Now Master Student at Université de Lorraine (training program for future school teachers).
- Guilherme Razet (2018) - Undergraduate studies in Computer Science for Humanities (3rd year, Licence MIASHS) at Université de Lorraine, Nancy. Topic: Creation of a test-suite for semantic parsing of arabic.
- Brice Ambrosiak (2007) - Undergraduate Studies in Computer Science (3nd year, Licence Mathématiques-Informatique) at Université Henri Poincaré, Nancy. Topic: development of a metagrammar explorer using Graphviz. Now senior software developer at Pictet Technologies, Luxembourg.