Workshop Shared Task: Statistical Machine Translation

Shared Task: Machine Translation for European Languages

The results of the shared task are summarized in the paper:

Further Meta-Evaluation of Machine Translation
Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder
pdf ps bib

Available for download here:

Format of the CSV judgment data columns:

Task, e.g. WMT08 Czech-English News Commentary
Type (RANK or CONSTITUENT or CONSTITUENT_ACCEPT)
Item ID (sentence number or constituent number - see below for information on constituent numbers)
Annotator ID (numerical ID since anonymized)
Time spent on annotation (in seconds)
System judgments (up to 5):
- System name, e.g., uedin
- Score, e.g. 2 (see below for score definitions for each test type)
- Additional score (blank this year, fluency for NIST test type in WMT 07)

Score column information:

Item ID column information:

RANK - Item ID corresponds to the line in the test set, starting with 0.
CONSTITUENT and CONSTITUENT_ACCEPT - Item ID corresponds to the line in the constituent file for that task.

Constituent files are delimited with ' ||| ' and contain the following fields:

System-Level Human and Automatic Rankings are white-space delimited with the following columns:

Metric: Rank, Const, Yes/No for human metrics, automatic metrics are named.
Language pair: e.g. fr-en
Test set: either test2008 or newstest2008
System name
Score: larger nnumbers are always better (MTER has been reversed), but may be larger than 1.0.
System type: either smt, rbmt or syscomb.

supported by the EuroMatrix project, P6-IST-5-034291-STP
funded by the European Commission under Framework Programme 6