Results
[HOME] | [PROGRAM] | [PROCEEDINGS] | [SHARED TASK] | [BASELINE SYSTEM] | RESULTS]
You can find a detailed analysis and summary of the official entries to the shared task in the proceedings. Please cite them or any use of this data as
@InProceedings{koehn-monz:2006:WMT,
author = {Koehn, Philipp and Monz, Christof},
title = {Manual and Automatic Evaluation of Machine Translation between European Languages},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
year = {2006},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {102--121},
}
This page contains automatic scores for the extended submissions and a link to the judgment data produced by the human annotators.
Extended submissions
ID | Participant |
cmu | Carnegie Mellon University, USA (report) |
lcc | Language Computer Corporation, USA (report) |
ms | Microsoft, USA (report)
|
nrc | National Research Council, Canada (report) |
ntt | Nippon Telegraph and Telephone, Japan (report) |
rali | RALI, University of Montreal, Canada (report) |
systran | Systran, France |
uedin-birch | University of Edinburgh, UK --- Alexandra Birch (report) |
uedin-phi | University of Edinburgh, UK --- Philipp Koehn (report) |
upc-jg | University of Catalonia, Spain --- Jesús Giménez (report) |
upc-jmc | University of Catalonia, Spain --- Josep Maria Crego (report) |
upc-mr | University of Catalonia, Spain --- Marta Ruiz Costa-jussà (report) |
upv | University of Valencia, Spain (report) |
utd | University of Texas at Dallas, USA (report) |
In the tables below, unofficial submissions are without rank and manual judgments. All submission are available for download.
French-English
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) | Adequacy (rank) | Fluency (rank) | BLEU (rank) |
upc-jmc | +0.19±0.08 (1-7) | +0.09±0.08 (1-8) | 30.42±0.86 (1-6) | +0.23±0.09 (1-5) | +0.13±0.11 (1-8) | 21.79±0.92 (1-4) |
lcc | +0.14±0.07 (1-6) | +0.13±0.06 (1-7) | 30.81±0.85 (1-4) | +0.13±0.12 (1-9) | +0.11±0.11 (1-9) | 21.77±0.88 (1-5) |
utd | +0.13±0.08 (1-7) | +0.14±0.07 (1-6) | 30.53±0.87 (2-7) | +0.04±0.10 (1-9) | +0.01±0.10 (1-8) | 21.39±0.94 (3-7) |
upc-mr | +0.13±0.08 (1-8) | +0.13±0.07 (1-6) | 30.33±0.88 (1-7) | +0.12±0.12 (2-8) | +0.11±0.10 (1-7) | 21.95±0.94 (1-3) |
nrc | +0.12±0.10 (1-7) | +0.06±0.11 (2-6) | 29.62±0.84 (8) | -0.03±0.14 (3-8) | +0.00±0.11 (3-9) | 21.15±0.86 (3-7) |
nrc2 | - | - | 29.91±0.85 | - | - | 20.25±0.83 |
nrc3 | - | - | 30.21±0.88 | - | - | 20.76±0.82 |
ntt | +0.11±0.08 (1-8) | +0.14±0.08 (2-8) | 30.72±0.87 (1-7) | -0.02±0.12 (3-9) | +0.08±0.11 (1-9) | 21.34±0.85 (3-7) |
ntt2 | - | - | 30.03±0.85 | - | - | 20.55±0.86 |
cmu | +0.10±0.08 (3-7) | +0.05±0.07 (4-8) | 30.18±0.80 (2-7) | +0.22±0.11 (1-8) | +0.13±0.09 (1-9) | 21.15±0.86 (4-7) |
rali | -0.02±0.08 (5-8) | +0.00±0.08 (3-9) | 30.39±0.91 (3-7) | -0.09±0.12 (4-9) | -0.10±0.11 (5-9) | 20.17±0.85 (8) |
systran | -0.08±0.09 (9) | -0.17±0.09 (8-9) | 21.44±0.65 (10) | +0.19±0.15 (1-8) | +0.15±0.14 (1-7) | 19.42±0.82 (9) |
upv | -0.76±0.09 (10) | -0.52±0.09 (10) | 24.10±0.89 (9) | -0.76±0.16 (10) | -0.58±0.14 (10) | 15.55±0.79 (10) |
uedin-phi | - | - | 31.94±0.86 | - | - | 22.50±0.92 |
Spanish-English
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) | Adequacy (rank) | Fluency (rank) | BLEU (rank) |
upc-jmc | +0.15±0.08 (1-7) | +0.18±0.08 (1-6) | 31.01±0.97 (1-5) | +0.28±0.10 (1-2) | +0.17±0.10 (1-6) | 27.92±0.94 (1-3) |
ntt | +0.10±0.08 (1-7) | +0.10±0.08 (1-8) | 31.29±0.88 (1-5) | +0.11±0.10 (2-7) | +0.17±0.10 (2-6) | 26.85±0.89 (3-4) |
ntt2 | - | - | 30.33±0.83 | - | - | 25.07±0.99 |
lcc | +0.08±0.07 (1-8) | +0.04±0.06 (2-8) | 31.46±0.87 (1-4) | +0.04±0.10 (4-9) | +0.07±0.11 (3-7) | 27.18±0.92 (1-4) |
utd | +0.08±0.06 (1-8) | +0.08±0.07 (2-7) | 31.10±0.89 (1-5) | +0.03±0.11 (2-9) | +0.03±0.10 (2-8) | 27.41±0.96 (1-3) |
nrc | +0.06±0.10 (2-8) | +0.08±0.07 (1-9) | 30.04±0.79 (6) | +0.18±0.16 (2-8) | +0.09±0.09 (1-8) | 25.40±0.94 (5-7) |
nrc3 | - | - | 30.39±0.84 | - | - | 25.89±0.85 |
upc-mr | +0.06±0.07 (1-8) | +0.08±0.07 (1-6) | 29.43±0.83 (7) | +0.08±0.11 (2-8) | +0.10±0.10 (1-7) | 25.62±0.87 (5-8) |
upc-mr2 | - | - | 30.62±0.86 | - | - | 28.25±0.93 |
uedin-birch | +0.03±0.11 (1-8) | -0.07±0.15 (2-10) | 29.01±0.81 (8) | +0.25±0.16 (1-7) | +0.18±0.19 (1-6) | 25.20±0.91 (5-8) |
rali | +0.00±0.07 (3-9) | -0.02±0.07 (3-9) | 30.80±0.87 (2-5) | -0.09±0.11 (4-9) | -0.15±0.11 (6-9) | 25.03±0.91 (6-8) |
upc-jg | -0.10±0.07 (7-9) | -0.11±0.07 (6-9) | 28.03±0.83 (9) | -0.09±0.11 (4-9) | -0.09±0.09 (7-9) | 23.42±0.87 (9) |
upv | -0.45±0.10 (10) | -0.41±0.10 (9-10) | 23.91±0.83 (10) | -0.63±0.14 (10) | -0.47±0.11 (10) | 19.17±0.78 (10) |
uedin-phi | - | - | 32.37±0.88 | - | - | 28.35±0.93 |
German-English
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) |
Adequacy (rank) | Fluency (rank) | BLEU (rank) |
uedin-phi | +0.30±0.09 (1-2) | +0.33±0.08 (1) | 27.30±0.86 (1) | +0.22±0.09 (1-6) | +0.21±0.10 (1-7) | 18.87±0.84 (1) |
lcc | +0.15±0.07 (2-7) | +0.12±0.07 (2-7) | 25.97±0.81 (2) | +0.18±0.10 (1-6) | +0.20±0.10 (1-7) | 17.96±0.79 (2-3) |
nrc | +0.12±0.07 (2-7) | +0.14±0.07 (2-6) | 24.54±0.80 (5-7) | +0.04±0.10 (3-8) | +0.04±0.09 (2-8) | 15.93±0.76 (7-8) |
nrc3 | - | - | 24.41±0.77 |
- | - | 16.28±0.74 |
utd | +0.08±0.07 (3-7) | +0.01±0.08 (2-8) | 25.44±0.85 (3-4) | +0.08±0.09 (2-7) | +0.07±0.08 (2-6) | 16.97±0.76 (4-6) |
ntt | +0.07±0.08 (2-9) | +0.06±0.09 (2-8) | 25.64±0.83 (3-4) | +0.07±0.12 (1-9) | +0.21±0.13 (1-7) | 17.37±0.76 (3-5) |
ntt2 | - | - | 25.01±0.79 |
- | - | 17.25±0.76 |
upc-mr | +0.00±0.09 (3-9) | -0.21±0.09 (6-9) | 23.68±0.79 (8) | +0.02±0.10 (4-8) | -0.11±0.09 (6-8) | 16.89±0.79 (4-6) |
rali | -0.01±0.06 (4-9) | +0.00±0.07 (3-9) | 24.60±0.80 (5-7) | -0.14±0.08 (8-9) | -0.14±0.08 (8-9) | 15.22±0.69 (8-9) |
upc-jmc | -0.02±0.09 (2-9) | -0.04±0.09 (3-9) | 24.43±0.86 (5-7) | -0.01±0.10 (4-8) | -0.04±0.11 (3-9) | 17.57±0.80 (2-5) |
systran | -0.05±0.10 (3-9) | -0.05±0.09 (3-9) | 15.86±0.59 (10) | +0.30±0.12 (1-4) | +0.21±0.12 (1-4) | 15.56±0.71 (7-9) |
upv | -0.55±0.09 (10) | -0.38±0.08 (10) | 18.08±0.77 (9) | -0.64±0.11 (10) | -0.54±0.09 (10) | 11.78±0.71 (10) |
English-French
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) | Adequacy (rank) | Fluency (rank) | BLEU (rank) |
nrc | +0.08±0.09 (1-5) | +0.09±0.09 (1-5) | 31.75±0.83 (1-6) | -0.13±0.13 (4-7) | -0.16±0.10 (4-7) | 23.66±0.91 (2-5) |
nrc2 | - | - | 30.81±0.82 | - | - | 23.27±0.82 |
nrc3 | - | - | 31.03±0.85 | - | - | 23.34±0.86 |
upc-mr | +0.08±0.08 (1-4) | +0.04±0.07 (1-5) | 31.50±0.76 (1-6) | +0.09±0.11 (2-4) | +0.04±0.09 (2-4) | 23.21±0.75 (2-6) |
upc-jmc | +0.03±0.09 (1-6) | +0.02±0.08 (1-6) | 31.75±0.78 (1-5) | +0.09±0.11 (2-5) | +0.09±0.11 (2-4) | 23.30±0.75 (2-6) |
systran | -0.01±0.12 (2-7) | +0.06±0.12 (1-6) | 25.07±0.71 (7) | +0.50±0.20 (1) | +0.41±0.18 (1) | 25.31±0.88 (1) |
utd | -0.03±0.07 (3-7) | -0.05±0.07 (3-7) | 31.42±0.85 (3-6) | -0.02±0.11 (2-6) | -0.05±0.09 (2-6) | 22.79±0.86 (7) |
rali | -0.08±0.09 (1-7) | -0.09±0.09 (2-7) | 31.79±0.85 (1-6) | -0.12±0.12 (4-7) | -0.17±0.12 (5-7) | 23.34±0.89 (2-6) |
ntt | -0.09±0.09 (4-7) | -0.06±0.08 (4-7) | 31.92±0.84 (1-5) | -0.23±0.12 (4-7) | -0.06±0.10 (4-7) | 22.99±0.96 (3-6) |
ntt2 | - | - | 30.79±0.78 | - | - | 21.44±0.90 |
uedin-phi | - | - | 33.66±0.81 | - | - | 25.26±0.91 |
English-Spanish
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) | Adequacy (rank) | Fluency (rank) | BLEU (rank) |
ms | +0.23±0.09 (1-5) | +0.13±0.09 (1-7) | 29.76±0.82 (7-8) | +0.33±0.16 (1-7) | +0.15±0.13 (1-8) | 26.15±0.88 (6-7) |
upc-mr | +0.20±0.09 (1-4) | +0.17±0.09 (1-5) | 31.06±0.86 (1-4) | +0.35±0.11 (1-3) | +0.19±0.10 (1-6) | 26.62±0.92 (1-2) |
utd | +0.18±0.08 (1-5) | +0.15±0.08 (1-6) | 30.73±0.90 (1-4) | +0.21±0.13 (2-6) | +0.13±0.11 (1-7) | 25.26±0.78 (3-5) |
nrc | +0.12±0.09 (2-7) | +0.17±0.08 (1-6) | 29.97±0.86 (5-6) | +0.18±0.12 (1-6) | +0.07±0.11 (2-7) | 25.58±0.85 (3-5) |
nrc3 | - | - | 30.13±0.87 | - | - | 26.00±0.83 |
ntt | +0.10±0.09 (3-7) | +0.14±0.08 (1-6) | 30.93±0.85 (1-4) | +0.12±0.13 (2-7) | +0.12±0.13 (1-7) | 26.52±0.90 (1-2) |
ntt2 | - | - | 28.37±0.75 | - | - | 22.59±0.84 |
upc-jmc | +0.04±0.10 (2-7) | +0.01±0.08 (2-7) | 30.44±0.86 (1-4) | +0.17±0.15 (2-7) | +0.24±0.12 (1-6) | 25.59±0.95 (3-5) |
rali | -0.05±0.08 (5-8) | -0.03±0.08 (6-8) | 29.38±0.85 (5-6) | -0.17±0.16 (6-8) | -0.05±0.13 (4-8) | 24.03±0.83 (6-8) |
uedin-birch | -0.18±0.14 (6-9) | -0.17±0.13 (6-10) | 28.49±0.87 (7-8) | -0.36±0.24 (6-10) | -0.16±0.16 (5-9) | 23.18±0.88 (7-8) |
upc-jg | -0.32±0.11 (9) | -0.37±0.09 (8-10) | 27.46±0.78 (9) | -0.45±0.13 (8-9) | -0.42±0.10 (9-10) | 22.04±0.84 (9) |
upv | -0.83±0.15 (9-10) | -0.59±0.15 (8-10) | 23.17±0.73 (10) | -1.09±0.21 (9) | -0.64±0.19 (8-9) | 16.83±0.72 (10) |
uedin-phi | - | - | 31.85±0.85 | - | - | 27.76±0.88 |
English-German
Submission | In Domain | Out-of-domain |
Adequacy (rank) | Fluency (rank) | BLEU (rank) |
Adequacy (rank) | Fluency (rank) | BLEU (rank) |
upc-mr | +0.28±0.08 (1-3) | +0.14±0.08 (1-5) | 17.24±0.81 (3-5) |
+0.31±0.13 (2-3) | +0.21±0.11 (1-3) | 10.96±0.70 (1-5) |
ntt | +0.19±0.08 (1-5) | +0.09±0.06 (2-6) | 18.15±0.89 (1-3) |
-0.03±0.12 (4-6) | +0.08±0.11 (3-5) | 10.51±0.64 (1-6) |
ntt2 | - | - | 18.13±0.81 | - | - | 11.01±0.64 |
upc-jmc | +0.17±0.08 (1-5) | +0.13±0.08 (1-4) | 17.73±0.81 (1-3) |
+0.22±0.14 (2-3) | +0.01±0.10 (3-6) | 10.64±0.66 (1-6) |
nrc | +0.17±0.08 (2-4) | +0.11±0.08 (1-5) | 17.52±0.78 (4-5) |
+0.00±0.11 (4-6) | +0.05±0.09 (2-6) | 10.64±0.65 (2-6) |
nrc3 | - | - | 17.44±0.83 | - | - | 10.82±0.57 |
rali | +0.08±0.10 (3-6) | +0.03±0.09 (2-6) | 17.93±0.85 (1-4) |
+0.13±0.12 (4-6) | -0.06±0.10 (4-6) | 10.57±0.65 (1-6) |
systran | -0.08±0.11 (5-6) | +0.00±0.10 (3-6) | 9.84±0.52 (7) |
+0.47±0.15 (1) | +0.39±0.15 (1-2) | 10.78±0.69 (1-6) |
upv | -0.84±0.12 (7) | -0.51±0.10 (7) | 13.37±0.78 (6) |
-0.94±0.13 (7) | -0.57±0.10 (7) | 6.55±0.53 (7) |
uedin-phi | - | - | 18.85±0.83 | - | - | 11.82±0.65 |
Judgment data
The judgments were solicited with an online tool that presented for a randomly selected sentences 5 translations for each sentence from the systems in random order. You can download them.
The file format is triple bar-separated lines with the fields:
- Task
- Judge
- Sentence number
- System
- adequacy or fluency judgment
- Score (1-5, 5 best)
- Time to score (in seconds)
Here are the first 10 lines:
WMT06 German-English ||| 1 ||| 614 ||| utd ||| ADEQUACY ||| 1 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| utd ||| FLUENCY ||| 1 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| nrc ||| ADEQUACY ||| 2 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| nrc ||| FLUENCY ||| 4 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| rali ||| ADEQUACY ||| 2 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| rali ||| FLUENCY ||| 4 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| upv ||| ADEQUACY ||| 2 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| upv ||| FLUENCY ||| 5 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| lcc ||| ADEQUACY ||| 2 ||| 62
WMT06 German-English ||| 1 ||| 614 ||| lcc ||| FLUENCY ||| 5 ||| 62