EMNLP 2015 TENTH WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Shared Task: Automatic Post-Editing

17-18 September 2015
Lisbon, Portugal

[HOME] | [TRANSLATION TASK] | [METRICS TASK] | [TUNING TASK] | [QUALITY ESTIMATION TASK] | [AUTOMATIC POST-EDITING TASK] | [SCHEDULE] | [PAPERS] | [AUTHORS] | [RESULTS]

This shared task will examine automatic methods for correcting errors produced by an unknown machine translation (MT) system. Since the system itself is a "black-box", automatic post-editing methods have to operate at downstream level (that is, after MT decoding), by exploiting knowledge acquired from previous human post-editions and provided as training material.

Goals

Automatic Post-editing (APE) aims at improving MT output in black-box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view APE components would make it possible to:

Task Description

In this pilot run of the shared task we will provide you with training (source, target, human post-edition) triples and you will return automatic post-editions for unseen (source, target) test pairs.

Data

Training and development data (the same used for the Sentence-level Quality Estimation task) respectively consist of 11,272 and 1,000 English-Spanish triples in which:

Sources, targets and human post-editions are provided in separated files.
Download training and development data.

Test data consist of 1,817 tokenized (source, target) pairs having the same characteristics of the source and target sentences provided as training.
Download test data.. (NEW!!!! -- TEST SET AVAILABLE! -- )

Any use of additional data for training your system is allowed (e.g. parallel corpora, post-edited corpora).

EVALUATION

Systems' performance will be evaluated with respect to their capability to reduce the distance that separates an automatic translation from its human-revised version. Such distance will be measured in terms of human-targeted TER (HTER).

While HTER is normally calculated as the minimum edit distance between the machine translation and its manually post-edited version in [0,1], in the APE task it will be used to measure the edit distance between automatic and manual post-editions.

The submitted runs will be ranked based on the average HTER calculated on the test set by using the tercom software.

IMPORTANT NEWS:
Each run will be evaluated in two modes, namely: i) case insensitive and ii) case sensitive.

If specified by the participants at submission stage (see Submission Requirements), final results for a given run can be released according to only one of the two modes.
Instead, if not specified by the participants at submission stage (see Submission Requirements), final results will be released by measuring system's performance in both ways, that is with two separate scores.

In both cases, lower average HTER will correspond to a higher rank.

The evaluation scripts available for download allow participants to compute HTER scores in both modalities.
Download the evaluation script.

The HTER calculated between the raw MT output and human post-editions in the test set will be used as baseline (i.e. the baseline is a system that leaves all the test instances unmodified).

Submission Format

The output of your system should produce automatic post-editions of the target sentences in the test in the following way:

<METHOD NAME>   <SEGMENT NUMBER>   <APE SEGMENT>

Where: Each field should be delimited by a single tab character.

Submission Requirements

Each participating team can submit at most 3 systems, but they have to explicitly indicate which of them represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.

Submissions should be sent via email to wmt-ape-submission@fbk.eu. Please use the following pattern to name your files:

INSTITUTION-NAME_METHOD-NAME_SUBTYPE, where:

INSTITUTION-NAME is an acronym/short name for your institution, e.g. "UniXY"

METHOD-NAME is an identifier for your method, e.g. "pt_1_pruned"

SUBTYPE indicates whether the submission is primary or contrastive with the two alternative values: PRIMARY, CONTRASTIVE.

EVALTYPE indicates whether the submission should be evaluated with only one of the two alternative modes or in both ways: INSENSITIVE, SENSITIVE, BOTH

For instance, the name "UniXY_pt_1_pruned_PRIMARY_BOTH" could be used to indicate the primary submission from team UniXY, based on method "pt_1_pruned", to be evaluated both in case insensitive and case sensitive mode.

You are also invited to submit a short paper (4 to 6 pages) to WMT describing your APE method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to give an appropriate reference describing your method(s) that we can cite in the WMT overview paper.

Important dates

Release of training data January 31, 2015
Test set distributed April 27, 2015
Submission deadline May 15, 2015
Paper submission deadlineJune 28, 2015
Notification of acceptanceJuly 21, 2015
Camera-ready deadlineAugust 11, 2015

Organisers

Rajen Chatterjee (Fondazione Bruno Kessler)
Matteo Negri (Fondazione Bruno Kessler)
Marco Turchi (Fondazione Bruno Kessler)

Acknowledgements

All the APE task data are kindly provided by Unbabel.

Contact

Please send your questions, comments, etc. to wmt-ape@fbk.eu.
To be always updated about this year's edition of the APE pilot task, you can also join the wmt-ape group.

Supported by the European Commission under the QT21
project (grant number 645452)