EMNLP 2021
SIXTH CONFERENCE ON
MACHINE TRANSLATION (WMT21)

November 10-11, 2021
Punta Cana (Dominican Republic) and Online

Shared Task: Multilingual Low-Resource Translation for Indo-European Languages

[HOME] [SCHEDULE] [PAPERS] [AUTHORS] [RESULTS]
TRANSLATION TASKS: [NEWS] [SIMILAR LANGUAGES] [BIOMEDICAL] [EUROPEAN LOW RES MULTILINGUAL] [LARGE-SCALE MULTILINGUAL] [TRIANGULAR MT]
[EFFICIENCY] [TERMINOLOGY] [UNSUP AND VERY LOW RES] [LIFELONG LEARNING]
EVALUATION TASKS: [QUALITY ESTIMATION] [METRICS]
OTHER TASKS: [AUTOMATIC POST-EDITING]


HUMAN EVALUATION

Metric. We perform sentence level evaluation with document context. Each sentence is evaluated in a Likert-like scale [1,5] answering the question of direct assessments (DA). Source DA (Romance family) also allows the evaluation of selected terms. We select 60 terms (mostly named entities, dates and locations) and annotate them as well translated, not translated and mistranslated by majority voting among the annotators.

ROMANCE FAMILY (Wikipedia)

ca2it ca2oc
z-score raw z-score raw
HUMAN 0.8±0.4 4.8±0.6 0.8±0.7 4.0±1.0
CUNI-Primary 0.5±0.7 4.4±0.9 0.5±0.8 3.6±1.1
M2M-100 (baseline) 0.4±0.7 4.2±1.0 -0.7±0.8 2.0±1.0
TenTrans-Primary 0.0±0.8 3.8±1.1 0.3±0.8 3.4±1.2
BSC-Primary -0.1±0.8 3.7±1.1 0.3±0.9 3.4±1.2
UBCNLP-Primary -0.5±1.0 3.1±1.3 0.0±0.9 3.0±1.2
mT5-devFinetuned (baseline) -1.2±0.9 2.3±1.2 -1.0±0.7 1.7±0.9


Term translation:
ca2it ca2oc
well mis no Σ well mis no Σ
HUMAN (reference) 53 0 3 56 40 0 2 42
CUNI-Primary 39 3 5 47 30 7 1 38
M2M-100 (baseline) 33 2 6 41 26 9 0 35
TenTrans-Primary 37 0 9 46 32 4 1 37
BSC-Primary 27 7 5 39 33 4 0 37
UBCNLP-Primary 29 16 1 46 19 1 0 20
mT5-devFinetuned (baseline) 20 17 10 47 25 11 4 40

NORTH-GERMANIC FAMILY (Europeana)

nb2sv is2sv
z-score raw z-score raw
M2M-100 (baseline) 0.7±0.6 4.2±0.8 0.1±1.0 2.0±1.1
EdinSaar-Primary 0.2±0.7 3.6±1.1 -0.1±0.8 1.9±1.0
UBCNLP-Primary 0.2±0.8 3.5±1.2 -0.4±1.0 1.6±1.1
mT5-devFinetuned (baseline) -1.2±0.7 1.5±1.1 0.4±1.1 2.4±1.2