This is a new shared task aimed at the generation of image descriptions in a target language, given an image and one or more descriptions in a different (source) language. The task can be addressed from two different perspectives:
The code for the main baseline system for both tasks is available here, following the approach described in (Elliott et al. 2015), in particular, the MLM➝LM model (due to several requests). A secondary baseline for both tasks will be a Moses phrase-based statistical machine translation system trained using only the textual training data provided, following the pipeline described here.
Task 1: Training, Validation, and Test sentences, and the splits.
Task 2: Training and Validation, and Test sentences, and the splits.
Image features will be provided to participants, but their use is not mandatory. In particular, we will release features extracted from the VGG-19 CNN, described in (Simonyan and Zisserman, 2015) from the FC7 (relu7) and CONV5_4 layers using Caffe RC2.
If you use the dataset created for this shared task, please cite the following paper: Multi30K: Multilingual English-German Image Descriptions.
@article{elliott-EtAl:2016:VL16, author = {{Elliott}, D. and {Frank}, S. and {Sima'an}, K. and {Specia}, L.}, title = {Multi30K: Multilingual English-German Image Descriptions}, booktitle = {Proceedings of the 5th Workshop on Vision and Language}, year = {2016}, pages = {70--74}, year = 2016 }
The results are also available for both tasks in the following paper: A Shared Task on Multimodal Machine Translation and Crosslingual Image Description.
Stella Frank gave a presentation about the shared task submissions and results at the conference.
You can also download the submissions to the shared task.
This task consists in translating English sentences that describe an image into German, given the English sentence itself and the image that it describes (or features from this image, if participants chose to). For this task, the Flickr30K Entities dataset was extended in the following way: for each image, one of the English descriptions was selected and manually translated into German by a professional translator. . We will provide most of the resulting parallel data and corresponding images for training, while smaller portions will be used for development and test.
As training and development data, we provide 29,000 and 1,014 triples respectively, each containing an English source sentence, its German human translation and corresponding image.
As test data, we provide a new set of 1,000 tuples containing an English description and its corresponding image.
Evaluation will be performed against the German human translation on the test set using standard MT evaluation metrics, with METEOR as the primary metric (lowercased text (with punctuation), both detokenised (primary) and tokenised versions). We will normalise punctuation in both reference translations and system submissions using this script. (Here are some additional notes on how we did the evaluation.) We may also include manual evaluation.
This task consists in generating a German sentence that describes an image, given the image itself and one or more descriptions in English. For this task, the Flickr30K Entities dataset was extended in the following way: for each image, five German descriptions were crowdsourced independently from their English versions, and independently from each other. Any English-German pair of descriptions for a given image could be considered a comparable translation pair. We will provide most of the images and associated descriptions for training, while smaller portions will be used for development and test.
As training and development data, we provide 29,000 and 1,014 images, each with 5 descriptions in English and 5 descriptions in German, i.e., 29,014 tuples containing an image and 10 descriptions, 5 in each language.
As test data, we provide a new set of approximately 1,000 tuples containing an image and 5 English descriptions.
Evaluation will be performed against five German descriptions collected as reference on the test set, with lowercased text and without punctuation, using METEOR. We may also include manual evaluation.
We suggest the following interesting resources that can be used as additional training data for either or both tasks:
The output of your system a given task should produce a target language description for each image formatted in the following way:
<METHOD NAME> <IMAGE ID> <DESCRIPTION> <TASK> <TYPE>Where:
METHOD NAME
is the name of your method.IMAGE ID
is the identifier of the test image.DESCRIPTION
is the output generated by your system (either a translation or an independently generated description). TASK
is one of the following flags: 1 (for translation task), 2 (for image description task), 3 (for both). The choice here will indicate how your descriptions will be evaluated. Option 3 means they will be evaluated both as a translation task and as an image description task.TYPE
is either C or U, where C indicates "constrained", i.e. using only the resources provided by the task organisers, and U indicates "unconstrained".
INSTITUTION-NAME
_TASK-NAME
_METHOD-NAME
_TYPE
, where:
INSTITUTION-NAME
is an acronym/short name for your institution, e.g. SHEF
TASK-NAME
is one of the following: 1 (translation), 2 (description), 3 (both).
METHOD-NAME
is an identifier for your method in case you have multiple methods for the same task, e.g. 2_NeuralTranslation, 2_Moses
TYPE
is either C or U, where C indicates "constrained", i.e. using only the resources provided by the task organisers, and U indicates "unconstrained".
For instance, a constrained submission from team SHEF for task 2 using method "Moses" could be named SHEF_2_Moses_C.
You are invited to submit a short paper (4 to 6 pages) to WMT describing your method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to provide a summary and/or an appropriate reference describing your method(s) that we can cite in the WMT overview paper.
Release of training data | January 30, 2016 |
Release of test data | April 10, 2016 |
Results submission deadline | May 4, 2016 |
Paper submission deadline | May 15, 2016 |
Notification of acceptance | June 5, 2016 |
Camera-ready deadline | June 22, 2016 |
For questions or comments, email Lucia Specia lspecia@gmail.com.