EMNLP 2015 Tenth Workshop on Statistical Machine Translation

EMNLP 2015 TENTH WORKSHOP
ON STATISTICAL MACHINE TRANSLATION

Shared Task: Tuning of MT Systems

17-18 September 2015
Lisbon, Portugal

Tuning Task Important Dates

Release of the MT system to tune	February 9, 2015
Registration for complimentary manual evaluation	February 22, 2015
Submission deadline for tuning task	April 20, 2015
Start of manual evaluation period	May 4, 2015
End of manual evaluation	June 1, 2015
Paper submission deadline	June TBD, 2015

Tuning Task Overview

The WMT15 tuning task is similar to WMT11 tunable metrics task. We provide the participants with a complete hierarchical model for English-to-Czech and Czech-to-English translation (i.e. one moses.ini file with all the model files for each translation direction) and a devset. A designated moses github revision will be used to run this model.

The participants are expected to incorporate their evaluation metric into the moses scorer, apply whichever moses optimizer they like or use any other tuning tricks to come up with their weight settings.

A submission to tuning task consists of an updated version of the moses.ini file, an optional weights file for sparse features (and your outputs on the official test set as an optional sanity check).

We will run the designated moses revision using your moses.ini file to obtain your MT outputs. (Note that the evaluation metric or tricks you used in the tuning are not needed and not used for the run.) The outputs will be manually ranked using the same scheme as the main translation task.

Other Requirements

For each run submitted to this evaluation, the team promises to join the WMT manual evaluation and annotate at least 100 HITs (ie. 300 5-way comparisons). This contribution to the manual evaluation can be done in whichever language pair you can evaluate and is needed most.

No registration is needed for the participation in the tuning task, unless you would like to make use of our manual judgements of Czech, see Complimentary Manual Evaluation below.

You are invited to submit a short paper (4 to 6 pages) describing your tuning technique. You are not required to submit a paper if you do not want to. If you don't, we ask that you give an appropriate description (a few paragraphs) or an appropriate reference describing your method to include or cite in the overview paper.

The System to Tune

This section contains the complete package of models to download.

English->Czech	Czech->English
moses.ini preview	moses.ini preview
en2cs_model.tgz (1.2GB)	cs2en_model.tgz (1.2GB)
Devset (newstest2014 from translation task)
Original Corpora, Alignments (optional)
en2cs_original_data.tgz	cs2en_original_data.tgz

The models are prepared for lowercase input tokenized with the standard Moses tokenizer (moses/scripts/tokenizer/tokenizer.perl).

For completeness and training of some of the standard sparse features, we also provide the full corpora and alignments.

When evaluating your submission, we will use Moses Release 3.0, i.e. the github commit 5244a7b607. This can be obtained also as pre-compiled binaries on the Moses Releases page.

Note that we plan to ignore any subsequent commits to the RELEASE-3.0 branch (unless prohibitive bugs are spotted). So to obtain the right sources, use:

git clone https://github.com/moses-smt/mosesdecoder.git moses
cd moses
git checkout 5244a7b607 -b tuning-task-2015
## and *NOT*: git checkout RELEASE-3.0, which could be a newer version

Prior to manual evaluation, we will run only the Moses standard detokenizer (moses/scripts/tokenizer/detokenizer.perl), upcasing sentence beginnings. This will result in names not uppercased but in less random effects due to the recaser. Talk to us if you think this is a bad decision.

Tuning Task Tracks

There are two tracks of the tuning task:

Constrained: You may use only the official WMT15 dev set, i.e. WMT14 test set to tune the system.
Unconstrained: You may include any other data for the tuning, for instance older WMT test sets, additional reference translations etc.

When submitting your moses.ini, please indicate, if your submission is constrained or non-constrained.

You are allowed to modify the moses.ini in any way. You may delete or add features (but you cannot supply additional model files). You may also change the search algorithm or increase whatever limits, under the reasonable assumption that we will be able to actually run the translation with these settings on our machines.

Based on the changes you make in the moses.ini, we will mark your submission with these flags (within both tracks):

Basic: No sparse features added, no custom settings or limits.
Sparse: Some sparse features added, no custom settings or limits.
Customized Basic: Other changes to the configuration made but no sparse features added.
Customized Sparse: Other changes to the configuration made, including some sparse features.

How to add Sparse Features

Please follow Moses documentation for instructions on adding sparse features to your moses.ini. If you add sparse features then you will probably have to use kbmira or PRO for the tuning of their weights.

For example, you can add sparse features for target word insertion by adding the following line to your moses.ini:

[feature]
SourceWordDeletionFeature factor=0

When you use sparse features, the weights are not stored in moses.ini but in an additional weights file. Make sure to include this weights file with your submission of moses.ini.

Complimentary Manual Evaluation of Translations into Czech

To allow a broader participation in the English-to-Czech direction, each registered participant of the tuning task will be given a 'credit' of manual pairwise sentence comparisons by our Czech native speakers. The exact number of judgments we can provide will be determined from the number of registered participants, but we expect no less than a few hundred sentence pair comparisons. Obviously, manual judging takes time and there can be a peak of demand as the submission deadline approaches, so remember to get in touch early.

To register for this English-to-Czech complimentary manual pre-evaluation, please send an e-mail to Ondřej Bojar.

Submitting Sentence Pairs for Czech Manual Evaluation

To make use of some of your 'credit', simply send the following plain text files to Ondřej Bojar:

The source English sentence.
The reference translation (if available).
System A output
System B output

Each sentence should be on a separate line, so all the three or four files must have exactly the same number of lines.

Our annotators will see the source, optionally the reference, and the two outputs. The outputs will be shuffled so that the system cannot be determined from the order of the hypotheses. The order of the sentences will not be shuffled, so do this yourself if you want to.

For each sentence, the annotator will mark one of the following:

Exactly one candidate translation as being the better one.
Both candidates as being equally good, acceptable translations.
Both candidates as being equally bad, inacceptable translations.

Tuning Task Organizers

Ondřej Bojar (Charles University in Prague)
Miloš Stanojević (University of Amsterdam, ILLC)
Amir Kamran (University of Amsterdam, ILLC)

Supported by the European Commision
under the
project (grant number 288487)

EMNLP 2015 TENTH WORKSHOPON STATISTICAL MACHINE TRANSLATION