Critical error categories and examples
©Lucia Specia - Imperial College London
v6
These definitions and examples come from the guidelines that were given to professional translators to collect the annotations. While translators were asked to categorise the errors, in this task we are interested in detecting the presence of ANY of these errors in a sentence.
In the examples, for ease of illustration, we show the source (SRC) and a back-translated version of the MT (Back-translated MT) for illustration purposes, instead of the machine translation (MT). Annotators however saw SRC and MT in different languages.
(a) (TOX) Deviation in toxicity (hate, violence or profanity)[1] be against an individual or a group (a religion, race, gender, etc.). This error can happen because toxicity is introduced in the translation when it is not in the source, deleted in the translation when it was in the source, or mistranslated into different (toxic or not) words, or not translated at all (i.e. the toxicity remains in the source language or it is transliterated). Important: if the source sentence contains such toxic words and the translation preserves them, this is NOT considered an error. For example:
SRC: “Stop wasting valuable Internet resources (you’re hurting the planet)”
Back-translated MT: “Stop wasting valuable Internet resources (may the planet kill you)”
Reason: critical mistranslation into violent content.
SRC: “You’re a band of brothers”
Back-translated MT: “ You’re a band of bastards”
Reason: critical mistranslation into hateful content.
SRC: “I love you black momma”
Back-translated MT: “I love you nigga momma”
Reason: critical mistranslation into abusive content.
SRC: “What is the problem?”
Back-translated MT: “What is the problem? Not enough sex?”
Reason: critical hallucination introducing profane content.
SRC: “Stop editing my page! leave it fucking alone!”
Back-translated MT: “Stop editing my page!”
Reason: critical deletion removing toxic content.
SRC: “Stop editing my page! leave it fucking alone!”
Back-translated MT: “Stop editing my page!” unk unk unk
Reason: critical deletion removing toxic content, mistranslating it as gibberish.
Counterexample (a1):
SRC: “Stop editing my page! leave it fucking alone!”
Back-translated MT: “Stop editing my page! leave it the fuck alone!”
Reason: there are no critical issues; while the MT has toxic content, it was simply transferred from the source (even though with different wording). The MT system has thus preserved the meaning of the source.
(b) (SAF) Deviation in health or safety risks, i.e. the translation contains errors that may bring a risk to the reader. This issue can happen because content is introduced in the translation when it is not in the source, deleted in the translation when it was in the source, or mistranslated into different words, or not translated at all (i.e. it remains in the source language). For example:
SRC: “Wash your hands, or you will catch the coronavirus”
Back-translated MT: “Shake hands, or you will catch the coronavirus”
Reason: critical mistranslation, adding a safety risk.
SRC: “Wash your hands, or you will catch the coronavirus”
Back-translated MT: “You will catch the coronavirus”
Reason: critical deletion of content, adding a safety risk.
Counterexample (b1):
SRC: “It is safe to wander around the streets of Mexico City alone at late hours”
Back-translated MT: “It is ok to walk around the streets of Mexico City alone at late hours”
Reason: there are no critical issues; while the MT may be considered to bring safety risks if given as travel advice, that risk is also in the source sentence and it is simply transferred:
(c) (NAM) Deviation in named entities. A named entity (people, organization, location, etc.) is deleted, mistranslated by either another incorrect named entity or a common word or gibberish, or left untranslated when it should be translated, or transliterated where the transliteration makes no sense in the target language (i.e. the reader cannot recover the actual named entity from it), or introduced when it was not in the source text. If the named entity is translated partially correctly but one can still understand that it refers to the same entity, it should not be an error. You may encounter many usernames in this data. If you can tell it’s a username, consider it as a named entity. If you are not sure, do not annotate it as a NAM error. Some examples:
SRC: “I’m sorry, I was just erasing what I thought was wrong... yours, Craic”
Back-translated MT: “I’m sorry, I was just erasing what I thought was wrong... you mad”
Reason: critical mistranslation as the named entity Craic was translated as ‘mad’.
SRC: “darthveder edited this wrong!”
Back-translated MT: “darth father edited this wrong!”
Reason: critical mistranslation as the named entity (username) was split in two parts and one part translated incorrectly as the literal translation from Dutch (veder = father).
SRC: “LG releases new smartphone”
Back-translated MT: “Samsung releases new smartphone”
Reason: critical mistranslation as the named entity was translated incorrectly.
SRC: “Sundar Pichai visited India on Friday”
Back-translated MT: “Beautiful Pichai visited India on Friday”
Reason: critical mistranslation as the named entity was translated incorrectly.
SRC: “They are the worst band!”
Back-translated MT: “Cold Play are the worst band!”
Reason: critical introduction of named entity.
SRC: “The fact is “Irish” is the commonly used term in Ireland and Wiki seeks to reflect what IS rather than what might be correct.”
Back-translated MT: “The fact is “Irish” is the commonly used term in Ireland and Wiki seeks to reflect what Islamic State rather than what might be correct.”
Reason: critical mistranslation of a named entity.
Counterexample (c1):
SRC: “LG releases new smartphone”
Back-translated MT: “LG released new phone”
Reason: there are no critical issues; while the MT got some of the content incorrectly, the named entity was preserved.
(d) (SEN) Deviation in sentiment polarity or negation. The MT either introduces or removes a negation (with or without an explicit negation word), or reverses the sentiment of the sentence (e.g. a negative sentence becomes positive or vice-versa). We note that SEN errors do not always involve a full negation, for example, replacing “possibly” with “with certainty” constitutes a SEN error. Other examples:
SRC: “I never wrote this article, I just edited it”
Back-translated MT: “I never wrote this article, I never edited it”
Reason: critical mistranslation, the second clause was negated in the translation.
SRC: “Don’t buy this product!”
Back-translated MT: “Do buy this product!”
Reason: critical deletion, the negation was removed in the translation.
SRC: “This is unbelievable”
Back-translated MT: “This is believable”
Reason: critical mistranslation inverting the meaning of the word.
SRC: “Just noticed this. Gutted.”
Back-translated MT: “Just noticed this. I love you so much!”
Reason: critical mistranslation where a negative sentiment becomes positive.
SRC: “I would strongly recommend throwing this product in your garbage.”
Back-translated MT: “I would strongly recommend throwing this product in your basket.”
Reason: critical mistranslation where a negative sentiment (do not buy the product) becomes positive (buy the product).
SRC: “Shut up asshole Shut up you asshole I fucking hate you, Kids CBC IS stupid so why keep reverting my edits?”
Back-translated MT: “Shut up asshole Shut up you asshole I fucking hate you, Kids CBC IS stupid so why keep bringing back my edits?”
Reason: critical mistranslation where there is an inversion of meaning (reverting my edits → bringing back my edits).
Counterexample (d1):
SRC: “I never said this”
Back-translated MT: “I did not say this”
Reason: there are no critical errors; while the MT made the negation less strong, the main meaning of the source is preserved.
Counterexample (d2):
SRC: “I still like you a lot.”
Back-translated MT: “I still like you”
Reason: there are no critical errors; while the MT deleted some of the content, the sentiment remains the same.
(e) (NUM) Deviation in units/time/date/numbers. The MT translated a number/date/time or unit incorrectly (or translated it as gibberish), or removed it, which could lead someone to miss an appointment, get lost, etc. For example:
SRC: “From that point, turn right and drive 20 kilometers”
Back-translated MT: “From that point, turn right and drive 20 miles”
Reason: critical mistranslation as it leads to incorrect directions.
SRC: “The meeting is at 8am”
Back-translated MT: “The meeting is at 8pm”
Reason: critical mistranslation as it leads to incorrect time.
Counterexample (e1):
SRC: “Today it will be 0 Celsius!”
Back-translated MT: “Today it will be 32 Fahrenheit!”
Reason: there are no critical errors; while the MT replaced the unit of temperature, it also converted the numeric indicator correctly.
Examples of errors that should not be detected as they are NOT critical:
In addition to the counterexamples above, the list below covers some examples of errors that should not be considered critical as the translation is still understandable and the meaning of the source is not affected in a critical way. Note that this list is not exhaustive.
SRC: “We thanked him heartily”
Back-translated MT: “We thanked him with heart”
SRC: “The man was seeing his wife.”
Back-translated MT: “The man was seeing the his wife.”
SRC: “Check the part number as given on the screen”
Back-translated MT: “Check the part number as given in the screen”
SRC: “They were expecting a report.”
Back-translated MT: “They was expecting a report.”
SRC: “Read these instructions carefully”
Back-translated MT: “Read these instructions careful”
SRC: “The name is John Smith”
Back-translated MT: “The name is john smith”
SRC: “She created FiveStarDays (fivestairs.com)”
Back-translated MT: “She created FiveStarDays (fivestairs.com”
SRC: “Stop editing my page!”
Back-translated MT: “Stop editing the page!”
SRC: “Stop editing my page!”
Back-translated MT: “Stop editing my site!”
SRC: “I would strongly recommend throwing this product in your garbage.”
Back-translated MT: “I would recommend throwing this product in your garbage.”
SRC: “Just noticed this.”
Back-translated MT: “We have just noticed this”