Triangular MT: Using English to improve Russian-to-Chinese machine translation - WMT 2021 Sixth Conference on Machine Translation

Shared Task: Triangular MT: Using English to improve Russian-to-Chinese machine translation

Task Description

Given a low-resource language pair (X/Y), the bulk of previous MT work has pursued one of two strategies.

Direct: Collect parallel X/Y data from the web, and train an X-to-Y translator, OR
Pivot: Collect parallel X/English and Y/English data (often much larger than X/Y data), train two translators (X-to-English + English-to-Y), and pipeline them to form an X-to-Y translator

However, there are many other possible strategies for combining such resources. These may involve, for example, ensemble methods, multi-source training methods, multi-target training methods, or novel data augmentation methods.

The goals of this shared task is to promote:

translation between non-English languages,
optimally mixing direct and indirect parallel resources, and
exploting noisy, parallel web corpora

Task: Russian-to-Chinese machine translation

We provide three parallel corpora:

Chinese/Russian: crawled from the web and aligned at the segment level, and combined with different public resources
Chinese/English: combining several public resources
Russian/English: combining several public resources

We evaluate system translations on a (secret) mixed-genre test set, drawn from the web and curated for high quality segment pairs. After receiving test data, participants have one week to submit translations. After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.

The evaluation metric for the shared task is 4-gram character Bleu.

The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)

Participate

To participate please register to the shared task on Codalab .

Link to Codalab website.

Important Dates

Apr 5, 2021: Release of training and development resources
Apr 5, 2021: Release of the baseline system
Jul 12, 2021: Release of test data
Jul 22, 2021: Official submissions due by web upload
Jul 26, 2021: Release of the official results
Aug 5, 2021: System description paper due
Sep 5, 2021: Review feedback
Sep 15, 2021: Camera-ready papers due
Nov 10-11, 2021: Workshop

Contacts

Chair: Ajay Nagesh (DiDi Labs, USA)
Email: ajaynagesh@didiglobal.com

Organizers

Arkady Arkhangorodsky (DiDi Labs, USA)
Ajay Nagesh, Chair (DiDi Labs, USA)
Kevin Knight (DiDi Labs, USA)

Acknowledgments:

Thanks to Didi Chuxing for providing data and research time to support this shared task.