August 8, 2020 - All results available.
July 24, 2020 - Hindi - Marathi, Marathi - Hindi results available.
July 9, 2020 - Submission instructions available. Please see below.
June 3, 2020 - The task dates have been updated. Please check the new datest below.
Within the MT and NLP communities, English is by far the most resource-rich language. MT systems are most often trained to translate texts from and to English or they use English as a pivot language to translate between resource-poorer languages. The interest in English is reflected, for example, in the WMT translation tasks (e.g. News, Biomedical) which have always included language pairs in which texts are translated to and/or from English. With the widespread use of MT technology, there is more and more interest in training systems to translate between languages other than English. One evidence of this is the need of directly translating between pairs of similar languages. The main challenge here is how to take advantage of the similarity between languages to overcome the limitation given the low amount of available parallel data to produce an accurate output.
Given the interest of the community in this topic we organize, for the second time at WMT, the shared task on "Similar Language Translation" to evaluate the performance of state-of-the-art translation systems on translating between pairs of languages from the same language family. This year we provide participants with training and testing data in five language pairs from three language families listed below. Evaluation will be carried out using automatic evaluation metrics and human evaluation.
In the previous edition of this task in WMT 2019, we included three language pairs: Spanish - Portuguese Czech - Polish, and Hindi - Nepali. Check the 2019 task website and the WMT 2019 report for more information.
The training data is available here. Last update July 7 2020.
To participate and receive the password, please fill out the registration form.
The test data is available at the same repository as the training data (here) and it can be accessed using the same password sent via e-mail. You are allowed to submit 1 PRIMARY and up to 2 CONTRASTIVE systems for each language pair/translation direction.
You should submit your results by July 15 2020 (anywhere in the world) in a zip file to wmt.similarlanguagetranslation(at)gmail.com. Your zip file should contain your submission files and a brief description of your approach(es) as follows:
1) A txt file for each of your submissions with one instance per line IN THE SAME ORDER as the test set. Your should name your file(s) as follows:
TEAMNAME_SOURCELANGUAGECODE_TARGETLANGUAGECODE_PRIMARYORCONTRASTIVE.txt
If, for example, Team X participated in Spanish - Catalan and submits a primary AND a contrastive submission, Team X will be sending us the following files:
TEAMX_ES_CA_PRIMARY.txt
TEAMX_ES_CA_CONTRASTIVE.txt
2) A single txt file containing one or two paragraph(s) describing your system(s). Please make this as complete as possible, as we will be using this information in the shared task report, but also concise so that it focuses on the most important information about your approach (max. 250 words).
The evaluation will be carried out automatically using BLEU (Papieni et al., 2002) and TER (Snover et al., 2006), and RIBES (Isozaki et al., 2010).
Here you can find the results of the Hindi - Marathi and Marathi - Hindi tracks ranked by BLEU score.
Here you can find the results of the Catalan - Spanish, Spanish - Catalan, Spanish - Portuguese, and Portuguese - Spanish tracks ranked by BLEU score.
Here you can find the results of the Croatian - Slovene, Slovene - Croatian, Slovene - Serbian, and Serbian - Slovene tracks ranked by BLEU score.
Your system paper submission should be prepared according to the WMT instructions and upload to START before August 24, 2020.
Release of training/dev data | April 15, 2020 |
Test data released | |
Submission deadline | |
System description paper deadline | |
Notifications | |
Camera-ready | |
Conference |