Chat Translation Task - EMNLP fifth Conference on Machine Translation

Shared Task: Machine Translation for Chats

Latest updates

October 8, 2020

Official manual scores are now available at: results.
The test set with the reference translations is now publicly available at: reference translations

August 14, 2020

Official automatic scores are available: results.
NOTE: Please note that the rankings here are based on the BLEU scores, which might change once we conclude the manual evaluations.
Unfortunately there has been a delay in our manual evaluation and we can not release the scores before the paper submission deadline. We are working on it and will release the scores before the deadline of the camera ready version.

July 9, 2020

System submission deadline has been extended. Please refer to the important dates section.
The conversion script for preparing the submissions released. Please refer to the submission format section.

July 4, 2020

Test set released. Please refer to the data section.

June 2, 2020

System submission deadline has been extended. Please refer to the important dates section.
Usage of additional training data has been updated. Please refer to note 5 in the data section.

OVERVIEW

Translating conversational text, in particular customer support chats, is an important and challenging application task for machine translation technology. This type of content has so far not been extensively explored in prior MT research, largely due to the lack of publicly available data sets. Prior related work has mostly focused on movie subtitles and European Parliament speeches. The primary goal of this new shared task is to develop a common ground for MT researchers to train and test their solutions by providing common training, dev, and test sets, as well as a common shared task definition. The corpora created for this shared task contain customer support chats for the English-German language pair.

In contrast to the translation of the news stories, software manuals, biomedical text, etc. in which the text is carefully authored and well formated, translating chat conversations are less planned, more informal, and often ungrammatical. Further, such conversations are usually characterized by shorter and simpler sentences and contain more pronouns. In effect, the task of translating chat conversations can be regarded as a two-in-one task, modelling both dialogue and translation at the same time.

Machine translation systems trained for chat conversations are expected to deal with the task's inherent challenges and characteristics, such as (among others):

The importance of using extended context for translating the segments and modelling dialogue. E.g. Agreement and anaphora resolution requiring inter-sentential modelling:

I had a flight with AirLiberty for next Saturday from X to Y. Could you please change it to next Monday?

Robustness to noisy input. Chat text is usually noisier, containing misspelled words, wrong casings, incomplete sentences, etc.,

Consistent and coherent translation throughout the entire conversation,

Modeling of all the speakers and language directions involved in the conversation, where each can be regarded as a different sub-domain (depending on the task).

Goals

The goals of chat translation shared task are to provide the common ground for:

Studying the applicability of machine translation systems for translating conversational text
Investigating the impact of context in a conversation's translation
Studying the feasibility of an all-in-one multi-lingual system

Task Description

A critical challenge faced by international companies today is delivering customer support in several different languages. One solution to this challenge is centralizing support with English speaking agents and having a translation layer in the middle to translate from the customer's language into the agent's (English) and vice versa.

Focusing on this scenario, we provide participants a common ground to train and test their models on a set of conversations between customers and agents. In the first edition of this shared task, we provide a corpus for the English-German language pair. More specifically, the task consists of translating the conversations of a customer and an agent that natively speak in German and English, respectively. The participants are asked to translate the customer's utterances from German into English and the agent's from English into German.

Although this year's task is not focused on dealing with noisy input nor measuring the model's robustness to such noise, the original English data contains different types of noise including typos, wrong capitalization, among others; and while not required, the participants might want to take this aspect into consideration when developing their models.

Data

The dataset used for this task is based on the Taskmaster-1 corpus which includes monolingual (i.e. English) task-based dialogs in six domains, i.e. (i) ordering pizza, (ii) creating auto repair appointments, (iii) setting up ride service, (iv) ordering movie tickets, (v) ordering coffee drinks, and (vi) making restaurant reservations. A subset of this dataset was selected and translated into German at Unbabel.

To download the data, click on the links in the table below. For each set there is a json file.

https://github.com/Unbabel/BConTrasT

Language pair	Train, Development	Test
EN-DE	train, dev	test

Each conversation in the data file has the following structure:

ConversationID: A unique identifier for each conversation.
Utterances: An array of utterances that make up the conversation. Each utterance has the following fields:
- UtteranceID: A 0-based index indicating the order of the utterances in the conversation.
- Speaker: Either customer or agent, indicating which role generated this utterance.
- Source: The utterance in the original source language.
- Target: The utterance in the translated target language.

Note 1: Since here we assume customer and agent speak in their own language, the source and target text might be in English or German depending on the role.

Note 2: In addition to the provided training data, the participants are allowed to use all the training data provided by the News shared task organizers.

Note 3: Usage of pre-trained models such as BERT, Transformer-XL, Reformer, etc is allowed.

Note 4: Taskmaster-1 corpus is originally monolingual (English) and was translated into German. For the agent side, participants model natural English into translated German; however, for the customer side the direction is translated German into natural English.

Note 5: Due to the high degree of sentence similarity within the TaskMaster monolingual corpus, participants are not allowed to use the additional monolingual data to train their systems.

Evaluation

Systems' performance will be evaluated both automatically using BLEU and with human evaluation using direct assessments.

BLEU scores will be computed in the case-sensitive mode on the de-tokenized version of the data using SacreBLEU.

Following the experience in performing document-level human evaluation for the News translation shared task from previous years, we will perform document-level direct assessment to rank the participating systems.

Note: Both sides (customer and agent) will be evaluated. However, as mentioned in Note 4, the customer target is unaltered natural English, and thus, may have noise.

Submission Format

While for the training and dev sets, both the source and target fields contain the respective sentences, in the test set, the target fields are left empty and are to be filled by the participants' MT output. So, the participants need to download the json file (to be released in the evaluation period), translate the source utterances, and submit a version of the file with the target fields filled with their submitted MT outputs.

To make it easier for participants, in addition to the test set, we have also released a conversion script that takes the original json file and the raw MT outputs, and produces the final json file ready for submission. You can download the script from mt2json.py

Although translating the customer side utterances is optional, the participants are required to submit translations for at least the agent side, from English into German. Translation of all utterances is preferrable, however, submissions of json files in which the target side of the customer utterances is left empty will be allowed and accepted!

Submission Requirements

Each team can submit at most 3 systems, one primary and up to two contrastive. The submitting team is required to explicitly indicate which of these submissions represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.

Submissions should be sent via email to wmt20-chat-task@unbabel.com.

In addition to submitting the MT outputs, the participants are required to submit a short paper (up to 4 pages) describing their system and their methodolgy.

Important Dates

Release of training data for shared task	March 31, 2020
Test data released	~~June 9, 2020~~ July 3, 2020
Translation submission deadline	~~June 16, 2020~~ ~~July 10, 2020~~ July 13, 2020
Start of human evaluation	July 11, 2020
End of human evaluation	August 7, 2020

Note: For the deadlines regarding the system description papers please follow the timeline of the main conference.

Organizers

M. Amin Farajian (Unbabel)
António Lopes (Unbabel)
Sameen Maruf (Monash University)
André Martins (Unbabel)
Gholamreza Haffari (Monash University)

Contact

For any information or question on the task, please send an email to: amin@unbabel.com or antonio.lopes@unbabel.com.

Sponsor

We would like to acknowledge Unbabel for their support and sponsorship in organizing the 2020 Chat Translation shared task.