The fourth round of the APE shared task follows the success of the previous three rounds organised in 2015, 2016 and 2017. The aim is to examine automatic methods for correcting errors produced by an unknown machine translation (MT) system. This has to be done by exploiting knowledge acquired from human post-edits, which are provided as training material.
The aim of this task is to improve MT output in black-box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view, APE components would make it possible to:
Similar to the last round, this year the task focuses on Information Technology domain for English-German language direction. One novelty, however, is represented by the addition of one MT system: this year, the task will hence cover MT output generated by a phrase-based system (PBSMT), and a neural MT system (NMT). In both cases, the source sentences have been translated into the target language by an MT system unknown to the participants (in terms of system configuration) and then manually post-edited by professional translators.
At training stage, the collected human post-edits have to be used to learn correction rules for the APE systems. At test stage they will be used for system evaluation with automatic metrics (TER and BLEU).
Compared to the the thrid round, the main differences are:
Training, development and test data consist in English-German triplets (source, target, and post-edit) belonging to the IT domain, and are already tokenized. All data is provided by the EU project QT21 (http://www.qt21.eu/).
For PBSMT system, the development set released in 2016 can be used to tune the systems.
To download the data click on the links in the table below:
Language pair | Domain | MT system | 2016 | 2017 | 2018 | Additional Resource |
---|---|---|---|---|---|---|
EN-DE | IT | PBSMT | train, dev, test | train test | test test.{pe,ref} | artificial training data+, eSCAPE Corpus* |
NMT | - | - | train, dev, test test.pe, test.ref |
+: This training data was created and used in "Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing"
*: This corpus was created and used in "eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing". It contains data generated by both PBSMT as well as NMT system
NOTE:
Any use of additional data for training your system is allowed (e.g. parallel corpora, post-edited corpora).
@InProceedings{chatterjee-EtAl:2018:WMT, author = {Chatterjee, Rajen and Negri, Matteo and Rubino, Raphael and Turchi, Marco}, title = {Findings of the WMT 2018 Shared Task on Automatic Post-Editing}, booktitle = {Proceedings of the Third Conference on Machine Translation}, month = {October}, year = {2018}, address = {Belgium, Brussels}, publisher = {Association for Computational Linguistics}, }
Systems' performance will be evaluated with respect to their capability to reduce the distance that separates an automatic translation from its human-revised version.
Such distance will be measured in terms of TER, which will be computed between automatic and human post-edits in case-sensitive mode.
Also BLEU will be taken into consideration as a secondary evaluation metric. To gain further insights on final output quality, a subset of the outputs of the submitted systems will also be manually evaluated like in previous round.
The submitted runs will be ranked based on the average HTER calculated on the test set by using the tercom software.
The HTER calculated between the raw MT output and human post-editions in the test set will be used as baseline (i.e. the baseline is a system that leaves all the test instances unmodified).
The evaluation script can be downloaded here
The output of your system should produce automatic post-editions of the target sentences in the test in the following way:
<METHOD NAME> <SEGMENT NUMBER> <APE SEGMENT>Where:
METHOD NAME
is the name of your automatic post-editing method.SEGMENT NUMBER
is the line number of the plain text target file you are post-editing.APE SEGMENT
is the automatic post-edition for the particular segment.Each participating team can submit at most 3 systems, but they have to explicitly indicate which of them represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.
Submissions should be sent via email to wmt-ape-submission@fbk.eu. Please use the following pattern to name your files:
INSTITUTION-NAME_METHOD-NAME_SUBTYPE
, where:
INSTITUTION-NAME
is an acronym/short name for your institution, e.g. "UniXY"
METHOD-NAME
is an identifier for your method, e.g. "pt_1_pruned"
SUBTYPE
indicates whether the submission is primary or contrastive with the two alternative values: PRIMARY
, CONTRASTIVE
.
You are also invited to submit a short paper (4 to 6 pages) to WMT describing your APE method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to give an appropriate reference describing your method(s) that we can cite in the WMT overview paper.
Release of training data | February 16, 2018 |
Release of test data | May 4, 2018 |
Submission deadline | June 4, 2018 |
Paper submission deadline | July 27, 2018 (TBC) |
Manual evaluation | TBD |
Notification of acceptance | August 18, 2018 (TBC) |
Camera-ready deadline | August 31, 2018 (TBC) |
For any information or question on the task, please send an email to:wmt-ape@fbk.eu.
To be always updated about this year's edition of the APE task, you can also join the wmt-ape group.
Supported by the European Commission under the QT21
project (grant number 645452)