ACL 2013 Eighth Workshop on Statistical Machine Translation

ACL 2013
EIGHTH WORKSHOP ON
STATISTICAL MACHINE TRANSLATION

8-9 August 2013
Sofia, Bulgaria

This workshop builds on six previous workshops on statistical machine translation, which is one of the most prestigious venues for research in computational linguistics:

IMPORTANT DATES

Release of training data	Early February 2013
Test set distributed for translation task	April 29, 2013
Submission deadline for translation task	May 3 2013
System outputs distributed for metrics task	May 10, 2013
Submission deadline for metrics task	May 31, 2013
Start of manual evaluation period	May 17, 2013
End of manual evaluation	June 7, 2013
Paper submission deadline	June 7, 2013
Notification of acceptance	June 24, 2013
Camera-ready deadline	July 1~~July 5~~, 2013

OVERVIEW

This year's workshop will feature three shared tasks:

a translation task,
a quality estimation task, and
a task to test automatic evaluation metrics.

In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:

word-based, phrase-based, syntax-based, semantics-based SMT
using comparable corpora for SMT
incorporating linguistic information into SMT
decoding
system combination
error analysis
manual and automatic method for evaluating MT
scaling MT to very large data sets

We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks.

TRANSLATION TASK

The first shared task which will examine translation between the following language pairs:

English-German and German-English
English-French and French-English
English-Spanish and Spanish-English
English-Czech and Czech-English
English-Russian and Russian-English NEW

Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.

All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations.

We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task.

QUALITY ESTIMATION TASK

A topic of increasing interest in MT is that of estimating the quality of translated texts. Different from MT evaluation, quality estimation (QE) systems do not rely on reference translations, but rather predict the quality of an unseen translated text (document, sentence, phrase) at system run-time. This topic is particularly relevant from a user perspective: among other applications, it can (i) help decide whether a given translation is good enough for publishing as is (Soricut and Echihabi, 2010); (ii) filter out sentences that are not good enough for post-editing (Specia, 2011); (iii) select the best translation among options from multiple MT and/or translation memory systems (He et al., 2010); and (iv) inform readers of the target language of whether or not they can rely on a translation (Specia et al., 2011).

Although still very recent, research in this topic has been showing promising results in the last couple of years. However, efforts are scattered around several groups and, as a consequence, comparing different systems is difficult as there are neither well established baselines nor standard evaluation metrics. In the Quality-Estimation track of the WMT workshop and shared-task, we will provide training and test sets, along with evaluation metrics and a baseline system. By providing a common ground for development and comparison, we expect to foster research in the topic, as well as to attract new people interested in the subject, who can build and evaluate new solutions using the provided resources.

EVALUATION TASK

The evaluation task will assess automatic evaluation metrics' ability to:

Rank systems on their overall performance on the test set
Rank systems on a sentence by sentence level

Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the system combination task. They will be provided with the output from the other two shared tasks along with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.

PAPER SUBMISSION INFORMATION

Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the ACL 2013 guidelines. In addition, shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions do not need to be.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.

ANNOUNCEMENTS

Subscribe to to the announcement list for WMT13 by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.

Email:

You can read past announcements on the Google Groups page for WMT13. These also include an archive of annoucements from earlier workshops.

INVITED TALK

Andreas Eisele, European Commission

ORGANIZERS

Ondrej Bojar (Charles University)
Christian Buck (University of Edinburgh)
Chris Callison-Burch (Johns Hopkins University)
Christian Federmann (DFKI)
Barry Haddow (University of Edinburgh)
Philipp Koehn (University of Edinburgh)
Christof Monz (University of Amsterdam)
Matt Post (Johns Hopkins University)
Herve Saint-Amand (University of Edinburgh)
Radu Soricut (SDL Language Weaver)
Lucia Specia (University of Sheffield)

PROGRAM COMMITTEE

Lars Ahrenberg (Link?ping University)
Eleftherios Avramidis (German Research Center for Artificial Intelligence (DFKI))
Daniel Beck (University of Sheffield)
Nicola Bertoldi (FBK)
Arianna Bisazza (Fondazione Bruno Kessler)
Graeme Blackwood (IBM Research)
Phil Blunsom (University of Oxford)
Chris Brockett (Microsoft Research)
Bill Byrne (University of Cambridge)
Nicola Cancedda (Xerox Research Centre Europe)
Hailong Cao (Harbin Institute of Technology)
Marine Carpuat (National Research Council)
Francisco Casacuberta (Universitat Polit?cnica de Val?ncia)
Daniel Cer (Stanford University)
Boxing Chen (NRC)
Colin Cherry (NRC)
David Chiang (USC/ISI)
Steve DeNeefe (SDL Language Weaver)
John DeNero (Google)
Michael Denkowski (Carnegie Mellon University)
Markus Dreyer (SDL Language Weaver)
Kevin Duh (Nara Institute of Science and Technology)
Chris Dyer (Carnegie Mellon University)
Marc Dymetman (Xerox Research Centre Europe)
Stefano Faralli (Sapienza University of Rome)
Yang Feng (University of Sheffield)
Andrew Finch (NICT)
Jos? A. R. Fonollosa (Universitat Polit?cnica de Catalunya)
Mikel Forcada (Universitat d'Alacant)
George Foster (NRC)
Alexander Fraser (Universit?t Stuttgart)
Katya Garmash (University of Amsterdam)
Niyu Ge (IBM Research)
Ulrich Germann (University of Edinburgh)
Daniel Gildea (University of Rochester)
Cyril Goutte (National Research Council Canada)
Nizar Habash (Columbia University)
Jan Hajic (Charles University in Prague)
Keith Hall (Google Research)
Greg Hanneman (Carnegie Mellon University)
Christian Hardmeier (Uppsala universitet)
Xiaodong He (Microsoft Research)
Yifan He (New York University)
Kenneth Heafield (Carnegie Mellon University, University of Edinburgh)
John Henderson (MITRE)
Silja Hildebrand (CMU)
Hieu Hoang (University of Edinburgh)
Young-Sook Hwang (SKPlanet)
Gonzalo Iglesias (University of Cambridge)
Abe Ittycheriah (IBM)
Doug Jones (MIT Lincoln Laboratory)
Maxim Khalilov (TAUS Labs)
Roland Kuhn (National Research Council of Canada)
Shankar Kumar (Google)
Mathias Lambert (Amazon.com)
Qun Liu (Dublin City University)
Wolfgang Macherey (Google)
Daniel Marcu (SDL)
Jos? B. Mari?o (Polytechnic University of Catalonia)
Cettolo Mauro (FBK)
Arne Mauser (Google, Inc)
Shachar Mirkin (Xerox Research Centre Europe)
Dragos Munteanu (SDL Language Technologies)
Markos Mylonakis (Xerox Research Centre Europe)
Preslav Nakov (Qatar Computing Research Institute, Qatar Foundation)
Kemal Oflazer (Carnegie Mellon University - Qatar)
Sergio Penkale (Lingo24)
Chris Quirk (Microsoft Research)
Stefan Riezler (Heidelberg University)
Johann Roturier (Symantec)
Anoop Sarkar (Simon Fraser University)
Holger Schwenk (University of Le Mans)
Jean Senellart (SYSTRAN)
Hendra Setiawan (IBM T.J. Watson Research Center)
Kashif Shah (University of Sheffield)
Wade Shen (MIT)
Linfeng Song (ICT/CAS)
Felipe S?nchez-Mart?nez (Universitat d'Alacant)
J?rg Tiedemann (Uppsala University)
Christoph Tillmann (IBM Research)
Dan Tufi? (Research Institute for Artificial Intelligence, Romanian Academy)
Masao Utiyama (NICT)
David Vilar (DFKI)
Haifeng Wang (Baidu)
Taro Watanabe (NICT)
Hua Wu (Baidu)
Peng Xu (Google Inc.)
Fran?ois Yvon (LIMSI/CNRS)
Richard Zens (Google)
Hao Zhang (Google)
Joy Ying Zhang (Carnegie Mellon University)
Liu Zhanyi (Baidu)
Josef van Genabith (Dublin City University)

CONTACT

For questions, comments, etc. please send email to pkoehn@inf.ed.ac.uk.

Supported by the European Commision
under the
project (grant number 288487)