19th Workshop on Building and Using Comparable Corpora
Please note that the program uses Mallorca time, i.e., GMT+2 (CEST).
Program: Monday, 11 May 2026
Room: Calvia, 1st floor
Zoom link: https://zoom.us/j/92983398859?pwd=gErMaK7bfqVgkSIaXq9Zb27bzItfnF.1
|
Monday, May 11, 2026
|
|
9:00
|
Session 1
Chair: Ayla Rigouts Terryn, Université de Montréal
|
|
|
|
|
|
|
|
|
11:00
|
Session 2: Comparable corpora for linguistics research
Chair: Philippe Langlais, Université de Montréal
|
|
|
11:00
|
Computing Semantic Similarity for Aligning Bilingual Semi-parallel Texts: A Case Study
Steffen Frenzel, Maximilian Krupop, Manfred Stede
University of Potsdam
|
|
11:24
|
A Comparative Study in Corpus Linguistics Applied to Automatic Terminology Extraction
Mercè Vàzquez1, Sergi Alvarez-Vidal2, Antoni Oliver1
1Universitat Oberta de Catalunya, 2Universitat Autònoma de Barcelona
|
|
11:48
|
Comparable Corpora in Cross-linguistic Research: Nominal Number in English, Czech, and Greek
Konstantinos Diamantopoulos and Magda Ševčíková
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
|
|
12:12
|
Liebe Kolleg:innen, Querid@s Compañer@s: Presenting the GILDEES Corpus
Marie-Pauline Krielke
Saarland University
|
|
12:36
|
A Diachronic Comparable Corpus of Spanish Digital News (2017–2026) for the Study of Stylistic Convergence in the GenAI Era
Hugo Sanjurjo-González
University of Deusto
|
|
|
|
|
14:00
|
Session 3: Synthetic corpora
Chair: Serge Sharoff, University of Leeds
|
|
|
14:00
|
Panel discussion: Comparable in the Age of LLMs: Fundamental questions at the intersection of comparable corpora and synthetic data (Chair: Serge Sharoff, University of Leeds)
Panelists:
Cristina España-Bonet (DFKI, Saarbrücken, Germany, and Barcelona Supercomputing Center, Barcelona, Spain)
Nizar Habash (NYU Abu Dhabi, UAE)
Philippe Langlais (Université de Montréal, Montréal, Canada)
Benoît Sagot (Inria, Paris, France)
|
|
|
|
|
|
|
16:30
|
Session 4: Building comparable datasets
Chair: Pierre Zweigenbaum, Université Paris-Saclay, CNRS
|
|
|
16:30
|
Parallel Corpora of Scholarly Documents for English-French Machine Translation
Ziqian Peng1, Lichao Zhu2, Rachel Bawden3, Maud Bénard2, Éric de la Clergerie3, Mathilde Huguin4, Natalie Kübler2, Paul Lerner5, Alexandra Mestivier2, François Yvon5
1Sorbonne Université, CNRS, ISIR & Inria, Paris, 2Université Paris Cité, ALTAE, 3Inria, 4CNRS, 5Sorbonne Université, CNRS, ISIR
|
|
16:54
|
Validating a Pipeline to Create a Comparable Corpus of Government-Issued Travel Advisories from the Internet Archives
Laura Braun and Christian Oswald
University of the German Federal Armed Forces
|
|
17:18
|
Leveraging Comparable Toxicity Lexicons in Prompt Instructions for Multilingual Text Detoxification
Yassir El Attar, Esra Dönmez, Nina K. Ohlendorf, Agnieszka Falenska
IMS, University of Stuttgart
|
|
|
|
|
|
Last modified: 11 May 2026, 9:50