NOTE: Please note that the rankings here are based on the BLEU scores, which might change once we conclude the manual evaluations.
Translating conversational text, in particular customer support chats, is an important and challenging application task for machine translation technology. This type of content has so far not been extensively explored in prior MT research, largely due to the lack of publicly available data sets. Prior related work has mostly focused on movie subtitles and European Parliament speeches. The primary goal of this new shared task is to develop a common ground for MT researchers to train and test their solutions by providing common training, dev, and test sets, as well as a common shared task definition. The corpora created for this shared task contain customer support chats for the English-German language pair.
In contrast to the translation of the news stories, software manuals, biomedical text, etc. in which the text is carefully authored and well formated, translating chat conversations are less planned, more informal, and often ungrammatical. Further, such conversations are usually characterized by shorter and simpler sentences and contain more pronouns. In effect, the task of translating chat conversations can be regarded as a two-in-one task, modelling both dialogue and translation at the same time.
Machine translation systems trained for chat conversations are expected to deal with the task's inherent challenges and characteristics, such as (among others):
I had a flight with AirLiberty for next Saturday from X to Y. Could you please change it to next Monday?
The goals of chat translation shared task are to provide the common ground for:
A critical challenge faced by international companies today is delivering customer support in several different languages. One solution to this challenge is centralizing support with English speaking agents and having a translation layer in the middle to translate from the customer's language into the agent's (English) and vice versa.
Focusing on this scenario, we provide participants a common ground to train and test their models on a set of conversations between customers and agents. In the first edition of this shared task, we provide a corpus for the English-German language pair. More specifically, the task consists of translating the conversations of a customer and an agent that natively speak in German and English, respectively. The participants are asked to translate the customer's utterances from German into English and the agent's from English into German.
Although this year's task is not focused on dealing with noisy input nor measuring the model's robustness to such noise, the original English data contains different types of noise including typos, wrong capitalization, among others; and while not required, the participants might want to take this aspect into consideration when developing their models.
The dataset used for this task is based on the Taskmaster-1 corpus which includes monolingual (i.e. English) task-based dialogs in six domains, i.e. (i) ordering pizza, (ii) creating auto repair appointments, (iii) setting up ride service, (iv) ordering movie tickets, (v) ordering coffee drinks, and (vi) making restaurant reservations. A subset of this dataset was selected and translated into German at Unbabel.
To download the data, click on the links in the table below. For each set there is a json file.
https://github.com/Unbabel/BConTrasTLanguage pair | Train, Development | Test |
---|---|---|
EN-DE | train, dev | test |
Each conversation in the data file has the following structure:
Note 1: Since here we assume customer and agent speak in their own language, the source and target text might be in English or German depending on the role.
Note 2: In addition to the provided training data, the participants are allowed to use all the training data provided by the News shared task organizers.
Note 3: Usage of pre-trained models such as BERT, Transformer-XL, Reformer, etc is allowed.
Note 4: Taskmaster-1 corpus is originally monolingual (English) and was translated into German. For the agent side, participants model natural English into translated German; however, for the customer side the direction is translated German into natural English.
Note 5: Due to the high degree of sentence similarity within the TaskMaster monolingual corpus, participants are not allowed to use the additional monolingual data to train their systems.
Systems' performance will be evaluated both automatically using BLEU and with human evaluation using direct assessments.
BLEU scores will be computed in the case-sensitive mode on the de-tokenized version of the data using SacreBLEU.
Following the experience in performing document-level human evaluation for the News translation shared task from previous years, we will perform document-level direct assessment to rank the participating systems.
Note: Both sides (customer and agent) will be evaluated. However, as mentioned in Note 4, the customer target is unaltered natural English, and thus, may have noise.
While for the training and dev sets, both the source and target fields contain the respective sentences, in the test set, the target fields are left empty and are to be filled by the participants' MT output. So, the participants need to download the json file (to be released in the evaluation period), translate the source utterances, and submit a version of the file with the target fields filled with their submitted MT outputs.
To make it easier for participants, in addition to the test set, we have also released a conversion script that takes the original json file and the raw MT outputs, and produces the final json file ready for submission. You can download the script from mt2json.py
Although translating the customer side utterances is optional, the participants are required to submit translations for at least the agent side, from English into German. Translation of all utterances is preferrable, however, submissions of json files in which the target side of the customer utterances is left empty will be allowed and accepted!
Each team can submit at most 3 systems, one primary and up to two contrastive. The submitting team is required to explicitly indicate which of these submissions represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.
Submissions should be sent via email to wmt20-chat-task@unbabel.com.
In addition to submitting the MT outputs, the participants are required to submit a short paper (up to 4 pages) describing their system and their methodolgy.
Release of training data for shared task | March 31, 2020 |
Test data released | |
Translation submission deadline | |
Start of human evaluation | July 11, 2020 |
End of human evaluation | August 7, 2020 |
Note: For the deadlines regarding the system description papers please follow the timeline of the main conference.
For any information or question on the task, please send an email to: amin@unbabel.com or antonio.lopes@unbabel.com.
We would like to acknowledge Unbabel for their support and sponsorship in organizing the 2020 Chat Translation shared task.