Translation Suggestion Task - EMNLP seventh Conference on Machine Translation

Shared Task: Translation Suggestion

Overview

This shared task focuses on the automatic methods for translation suggestion (TS), which provides alternatives for the incorrect span of the MT sentence automatically. Translation suggestion is an important tool for computer-aided translation and has proven its ability in improving the efficiency of post-editing (PE). There are two main pitfalls for conventional works in this area:

Most work only focus on the overall performance of PE but ignore the exact performance of TS.
Almost all previous works conduct experiments on their in-house datasets or the automatically built noisy datasets, thus, their experiments are hard to be followed and compared.

This shared task provides: (i) two sub-tasks for TS, these two tasks are both designed based on our applications; (ii) the human-labeled golden corpus for four different translation directions; (iii) the strong baselines and our code base.

Our specific goals are:

to test and compare the performance of different models on the golden corpus
to release the golden corpus for the community and establish a benchmark well-accepted by researchers in this area
to spur research in this area

For all tasks, the datasets and NMT models that generate the translations are publicly available.

Participants are also allowed to use the publicly available pre-trained models and explore any corpus (monolingual or bilingual) provided by WMT22 general translation task, but the resources should be disclosed in their system descriptions.

Important dates

Release of training and dev data	April 25th, 2022
Release of test data	June 29th, 2022
Submission deadline	July 8th, 2022
System descriptions deadline	September 1st, 2022
Paper notification	October 6th, 2022
Camera-ready deadline	October 15th, 2022

Note: The system description papers should follow the paper submission policy in WMT, plese see the section of paper submission information in WMT homepage for more details. All deadlines are 11:59 PM UTC+8.

Task 1: Naive Translation Suggestion

This task offers the human-labeled golden data for 4 translation directions: Chinese-English (Zh-En), English-Chinese (En-Zh), English-German (En-De) and German-English (De-En). The datasets are collected by translating the sampled source sentences with the SOTA Transformer NMT model and then annotated by professional translators. The detailed descriptions about the data collection can be found at WeTs setup. Each sample includes the source sentence, MT sentence, the incorrect span of the MT sentence, and the top-1 suggestion.

Training and dev data: Download the training, development data .

Test data: The participants are expected to submit their results of the test set. You can download the test data here. The following is a quick guide to the statistics of the corpus

	Train	Dev	Test
En-De	12000	2000	1000
De-En	10000	2000	1000
En-Zh	15000	2700	1000
Zh-En	15000	2700	1000

Baselines: The baseline system is a conventional Transformer model implemented by fairseq toolkit. For the baseline system, the input to the Transformer encoder is the concatenation of the source and MT sentences where the incorrect span of the MT sentence is replaced with a special placeholder token.

Evaluation: Each submission will be evaluated in terms of the document-level BLEU scores for the top-1 suggestion against the reference sentences. We use the official evaluation tool scarebleu . For Chinese, the BLEU score is calculated on the characters with the default tokenizer for Chinese; For English and German, the BLEU score is calculated on the case-sensitive words with the default tokenizer 13a.

sacrebleu -t ref.txt -i hyp.detok.txt -l en-de
sacrebleu -t ref.txt -i hyp.detok.txt -l de-en
sacrebleu -t ref.txt -i hyp.detok.txt -l zh-en
sacrebleu -t ref.txt -i hyp.detok.txt -l en-zh

Task 2: Translation Suggestion with Hints

Compared to task 1, the difference is that we also provide the model with some hints, which can be useful for the model to give more correct suggestions. For this task, each sample includes the source sentence, MT sentence, the incorrect span of the MT sentence, hints for top-1 suggestion, and the top-1 suggestion. The hints are generated automatically following WeTs setup. Note: The hints used here are somewhat different from that used in WeTs. We only take the first-k initial characters as the hints and the k is randomly sampled.

Training and dev data: Download the training, development data .

Test data: The participants are expected to submit their results of the test set. You can download the test data here.

Baselines: The baseline system is a conventional Transformer model implemented by fairseq toolkit. For the baseline system, the input to the Transformer encoder is the concatenation of the source sentence, MT sentence and the hint, where the incorrect span of the MT sentence is replaced with a special placeholder token.

Evaluation: Each submission will be evaluated in terms of the document-level BLEU scores for the top-1 suggestion against the reference sentences. We use the official evaluation tool scarebleu . We only provide corpus for the translation directions of English2Chinese and Chinese2English. For Chinese, the BLEU score is calculated on the characters with the default tokenizer for Chinese; For English, the BLEU score is calculated on the case-sensitive words with the default tokenizer 13a.

sacrebleu -t ref.txt -i hyp.detok.txt -l zh-en
sacrebleu -t ref.txt -i hyp.detok.txt -l en-zh

Attention: All training, dev and test sets are subject to the corpus provided by this website. If it helps, you can download the NMT models which were used to generate the MT sentences of our corpus.

Additional Resources

These are the parallel data which may be useful for the participants.

Submission Requirements

Each participating team can submit at most 15 systems for each of the translation directions of each subtask. The participants can submit their results and scan scores by the Website . Before submitting, the participants are required to sign up by dropping an email to the organizers, which should includes the following information: user name, passwd, team name, organization, and email.

Organizers

Zhen Yang (Tencent Inc.)
Fandong Meng (Tencent Inc.)
Yingxue Zhang (Tencent Inc.)
Ernan Li (Tencent Inc.)
Jie Zhou (Tencent Inc.)

Contact

Feel free to contact us for any questions by dropping an email to Zhen Yang.