Shared Task: Triangular MT: Using English to improve Russian-to-Chinese machine translation
Task Description
Given a low-resource language pair (X/Y), the bulk of previous MT work has pursued one of two strategies.
- Direct: Collect parallel X/Y data from the web, and train an X-to-Y translator, OR
- Pivot: Collect parallel X/English and Y/English data (often much larger than X/Y data), train two translators (X-to-English + English-to-Y), and pipeline them to form an X-to-Y translator
However, there are many other possible strategies for combining such resources. These may involve, for example, ensemble methods, multi-source training methods, multi-target training methods, or novel data augmentation methods.
The goals of this shared task is to promote:
- translation between non-English languages,
- optimally mixing direct and indirect parallel resources, and
- exploting noisy, parallel web corpora
Task: Russian-to-Chinese machine translation
We provide three parallel corpora:
- Chinese/Russian: crawled from the web and aligned at the segment level, and combined with different public resources
- Chinese/English: combining several public resources
- Russian/English: combining several public resources
We evaluate system translations on a (secret) mixed-genre test set, drawn from the web and curated for high quality segment pairs. After receiving test data, participants have one week to submit translations. After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.
The evaluation metric for the shared task is 4-gram character Bleu.
The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)
Participate
To participate please register to the shared task on Codalab .
Important Dates
- Apr 5, 2021: Release of training and development resources
- Apr 5, 2021: Release of the baseline system
- Jul 12, 2021: Release of test data
- Jul 22, 2021: Official submissions due by web upload
- Jul 26, 2021: Release of the official results
- Aug 5, 2021: System description paper due
- Sep 5, 2021: Review feedback
- Sep 15, 2021: Camera-ready papers due
- Nov 10-11, 2021: Workshop
Contacts
Chair: Ajay Nagesh (DiDi Labs, USA)
Email: ajaynagesh@didiglobal.com
Organizers
- Arkady Arkhangorodsky (DiDi Labs, USA)
- Ajay Nagesh, Chair (DiDi Labs, USA)
- Kevin Knight (DiDi Labs, USA)
Acknowledgments:
Thanks to Didi Chuxing for providing data and research time to support this shared task.