Shared Task: Biomedical Translation Task
Task description
This task aims to evaluate systems on the translation of documents from the biomedical domain.
The training and test data will consist of documents retrieved from various databases.
This year, the biomedical translation task will address the following language pairs:
- English-Czech NEW
- English-French and French-English
- English-German NEW
- English-Hungarian NEW
- English-Polish NEW
- English-Portuguese and Portuguese-English
- English-Romanian NEW
- English-Spanish and Spanish-English
- English-Swedish NEW
Data
Our training and development data comes from various sources, as listed below:
- The Biomedical Translation repository includes
scientific publications (FR/EN, PT/EN, ES/PT) and clinical trials (PT/EN).
- UFAL Medical Corpus (formerly HimLCorpus) includes medical text
from various sources for the following language pairs: CS/EN, DE/EN, PL/EN, RO/EN (and more).
HimL test sets should be used as the development sets for these language pairs.
- The Khresmoi development data
for FR/EN, HU/EN, ES/EN and SV/EN.
Evaluation
Evaluation will be carried out both automatically and manually.
Automatic evaluation will make use of standard machine translation metrics, such as BLEU.
Native speakers of each of the languages will manually check the quality of the translation for a small sample of the submissions.
We plan to release test sets for the following language pairs and sources:
- FR/EN (both directions): scientific publications
- PT/EN (both directions): scientific publications
- ES/EN (both directions): scientific publications
- EN to CS: health information texts
- EN to DE: health information texts
- EN to FR: health information texts
- EN to HU: health information texts
- EN to PL: health information texts
- EN to RO: health information texts
- EN to ES: health information texts
- EN to SV: health information texts
Submission format
The various datasets come in different formats, please check details on the Web site of the respective repositories and collections and contact us if you have any question.
Submission Requirements
Please register your team using this form. You will receive a mail with the confirmation of your registration.
The link for submission is informed in this mail.
The test files are available in the WMT'17 biomedical task Google Drive folder. There are three folders:
- Scielo datasets: ES/EN and PT/EN (both directions)
- EDP dataset: FR/EN (both directions)
- HimL-KConnect datasets: EN to CS, EN to DE, EN to FR, EN to HU, EN to PL, EN to RO, EN to ES, EN to SV
We moved our files to Zenodo, please check links in the
WMT Biomedical Translation Task repository.
The format for the submission files should included the original test file name preceded by the team identifier
(as registered in the form above) and the run number, following this example:
- The submission file for run 1 of the "HPI" team for the Scielo dataset for English to Spanish should be called
"HPI_run1_scielo_en2es_es.xml".
- The submission file for run 1 of the "HPI" team for the UFAL Cochrane dataset for English to German should be called
"HPI_run1_cochrane_de.sgm".
Each team is allowed to submit up to 3 runs per test file.
Please check the submission format of each dataset. The UFAL datasets (Cochrane and NHS) follow the format of
the corresponding UFAL corpus while the Scielo and the EDP
datasets follow the BioC format of the Scielo dataset.
Results
Results for the biomedical task are available.
The gold standard files are available in the
WMT'17 biomedical task Google Drive folder.
We moved our files to Zenodo, please check links in the
WMT Biomedical Translation Task repository.
Important dates
Release of training data | end of January 2017 |
Release of test data | May 2, 2017 |
Results submission deadline | May 8, 2017 May 12, 2017 |
Organisers
Ondrej Bojar (Charles University in Prague, Czech Republic)
Antonio Jimeno Yepes (IBM Research Australia)
Aurélie Névéol (LIMSI, CNRS, France)
Mariana Neves (Federal Institute for Risk Assessment / Hasso Plattner Institute, Germany)
Pavel Pecina (Charles University in Prague, Czech Republic)
Karin Verspoor (University of Melbourne, Australia)
Please contact us in the mail wmtbiomedical@gmail.com.
Please also joing our discussion forum.