Imagine translating a story about a dog. You’re working in a translation management system on a source sentence: “The black dog was almost hit while running across the street.” Now suppose your translation memory (TM) has a stored translation for: “The brown dog was hit while running across the street.” Which is what is called a fuzzy match. Close enough—you just have to change “brown” to “black” and add the (very important) “almost.”
But what if a machine translation (MT) engine offers you a perfect translation for the sentence, save for slightly awkward phrasing: “While running across the street, the black dog was almost struck”?
Do you use the translation from the TM, which is human-generated but requires editing, or the translation that’s almost perfect but came from a machine?
What seems like six of one, half dozen of the other isn’t a simple choice. Once upon a time, the choice between TM and MT was simple: take the path requiring the least human interference, which generally was TM because MT quality was so poor. But it’s getting tougher for translators to decide. With advances in machine learning, we’ve finally reached a point where MT-generated translations are competing with low fuzzy matches.
So, does that make TMs obsolete?
Not at all. Not yet, at least. But it is time for us to rethink the industry’s long-established norms around fuzzy matching that have persisted for over two decades.
The translator’s choice: fuzzy match or MT result?
Traditionally, the low-end threshold for a fuzzy match was 70-75%. Up until recently, we had little reason to question it: a fuzzy match of 70-75% was the clear winner before MT made the leap to neural. And even now that MT results have generally improved, we’ve yet to see an academic paper or large body of research across major languages that proves MT has surpassed TM.
But we do have anecdotal evidence that this could be true. Earlier this year, TAUS released a paper showing (based on their own data) that in Romance languages, at least, anything below an 85% match is potentially better handled by machine translation than by translation memory.
You might say it makes sense, then, to raise the bar to at least 85%. But there are a few problems with this:
Language is so flexible and language/content type permutations are so endless that nobody can say for certain that an 85% threshold would win in every possible use case. There will be different standards for legal content in French, different rules for technical content in Russian, and so on.
On top of that, you have more variables: different MT engines are good for different use cases, and different algorithms behind TM matching (there is no one “standard” algorithm) can all result in different levels of match. Again, the possibilities are infinite.
Even if we did figure out that 85% represents a better threshold for fuzzy matching across the board, nobody is going to come out and say that anything below 85% is always best translated by MT. It’s too risky to apply a catch-all rule.
And so, we remain at an interesting point of experimentation. When fuzzies get down to low-ball matches of around 70%, translators are faced with using their best judgment to decide whether it’s worth accepting the fuzzy match (and editing it to reflect the full source meaning) or accepting an MT-generated translation of the full source meaning (that might need edits, albeit of a different type, for accuracy and/or fluency).
What’s the best use of their time? There is no right answer.
The fact that we even have to wonder which is better marks a new crossroads for the translation industry. It’s not unlike the industrial age. The Wright brothers, for example, had to crash dozens of prototypes of their plane before they finally got one off the ground. Similarly, because it’s not possible for us to account for every permutation or variable, we try things, learn from our mistakes, and as our experience (and MT) evolves, we’ll figure it out.
The tipping point
But if we can’t account for every possibility, how will we know at what point machine translation will overtake the reliance on TM?
The shift will be gradual, but for now it will be a question of how “quality” is defined.
Today, whether machine translation (with or without post-editing) is of better or worse quality than a fuzzy match of 85% (with or without revisions by the translator) is totally in the eyes of the beholder. But one day, the technology itself will be able to guide us.
Here we get into something called quality evaluation (QE), where neural MT can begin to evaluate the quality of its own output. Instead of just giving you a machine-generated translation—take it or leave it—the machine will get intelligent enough to tell you: no, this isn’t a perfect translation, but I can point out the location of potential errors for you and provide options to fix them. It will not only self-assess but self-diagnose. As time goes on, with enough experience, it will even fix some errors for you based on your previous choices. We already see this now to a degree with adaptive MT.
Once MT can become fully self-aware of its mistakes and provide the human editor viable ways to fix them, that will be the point at which it overtakes TM in certain use cases. And we’re not looking too far into the future, either. With so much investment from big tech, MT could reach that point sooner than we think.
But don’t get us wrong—we can still say with a high degree of certainty that humans won’t be replaced by machines. It’s just that we’ll move from what we used to call computer-assisted translation to human-assisted translation. The machine will take the first pass—at least for content with lower emotional weight, but maybe eventually for higher-weighted content as well—before the human cleans it up and the MT engine’s performance is judged.
Then, of course, we have to think about what this shift in dynamics means for translators. If the machine takes the first pass, then the human becomes more editor than translator. Does this make translators worse off financially because less work is required? What about the hyper-specialized translators who translate highly branded marketing content, to whom using MT or TM is just a hassle that takes away from the creative process of translation? Can we force MT on those guys?
It all leads back to the reason we use fuzzy matching in the first place: it’s still human. When MT can get to the deeper meaning and nuance behind text, when it can understand sentence flow and figure out different styles of writing across languages…who knows where it will take us?
假设我们在翻译一个关于狗的故事,其中的一个源句是:“黑狗在穿过街道时几乎被撞倒。”,假如翻译记忆库(TM)中存储了以下译文:“棕狗在穿过街道时被撞倒。”,这就是所谓的模糊匹配。文本足够类似——译者只需将“棕”改为“黑”,然后加上“几乎”两个字即可。
但如果机器翻译(MT)引擎可以为您提供句义没有错误、只是措辞有些别扭的译文:“黑狗几乎被撞倒,在穿过街道时”呢?
翻译记忆库(TM)给出的译文是人工翻译但需要编辑,而机器翻译句义没有错但不像人工翻译那般自然,译者应该选哪个呢?
这个选择还真不好做出。曾几何时,翻译记忆库(TM)和机器翻译(MT)之间的选择很简单:选那条需要人工干预最小的路径,而这通常是翻译记忆库(TM),因为机器翻译(MT)的质量太差了。但是,翻译人员要做出这样的决定越来越难。随着机器学习的进步,机器翻译(MT)生成的翻译终于到了可以与低模糊匹配竞争的地步。
那么,这会使翻译记忆(TM)被淘汰吗?
并不。至少现在还没有。但是,现在是时候重新审视业界关于模糊匹配的准则了,而这些准则已经存在二十多年了。
译者的抉择:模糊匹配还是机翻译文?
传统上,模糊匹配的低端阈值为70-75%。 直到最近,我们还没有理由质疑它:在机器翻译(MT)转向神经之前,70-75%的模糊匹配明显超过机器翻译质量。 即使现在机器翻译(MT)的结果总体上有所改善,我们还没有看到有学术论文或大量研究证明机器翻译(MT)已超过翻译记忆库(TM)。
但是确实有些证据疑似证明机器翻译(MT)可能已超越翻译记忆库(TM)。 今年早些时候,TAUS根据他们自己的数据发布了一篇论文,该论文表明,在罗曼语中,匹配度低于85%的情况下,机器翻译的译文可能比翻译记忆库的译文更好。
因此,可能有人会说将模糊匹配的标准提高到至少85%是有道理的,但这种做法有一些问题:
•语言非常灵活,并且语言和内容类型的排列无穷无尽,没有人能保证85%的阈值能适用于每种具体情况。例如,对法语的法律内容需要有不同的标准,对俄语的技术内容也要有不同的规则等等。
•而且,还有更多的变量要考虑:不同的机器翻译引擎适合不同的情况,机器翻译匹配背后的算法不同,都可以导致匹配度的不同,而且 “标准”算法也不存在。
•即使我们的确发现85%总的来说是更合适的模糊匹配阈值,也没有人敢说,只要模糊匹配度低于85%,机器翻译就优于翻译记忆库的翻译,因为“一刀切”的做法太冒险了。
所以,我们现在还处在有趣的试验阶段。当模糊度降低到大约70%时,翻译人员用自己的判断,决定是否值得接受模糊匹配(并对其进行编辑以反映完整的原文意思)还是接受机器翻译生成的包含完整源语意思的译文(可能需要进行各种编辑,以确保准确性和流利性)。
译者如何最好地利用有限的时间?这个问题没有标准答案。
我们要思考机器翻译(MT)和翻译记忆库(TM)哪个更好,而这一事实就标志着翻译行业走到了新的十字路口。这与工业时代没有什么不同。例如,莱特兄弟(Wright brothers)在成功造出飞机之前,坠毁了数十架原型机。同样,由于我们无法考虑到每个排列组合或变量,因此我们要尝试做一些事情,然后从错误中吸取教训,并随着我们的经验(以及机器翻译)的发展,找出解决问题的方法。
临界点在何处?
但是,如果我们不能考虑到所有可能性,那么我们怎样知道机器翻译何时会摆脱对翻译记忆的依赖?
这种转变会逐渐进行,但目前,问题在于如何定义“质量”。
如今,机器翻译(带或不带后期编辑)的质量与85%模糊匹配的译文(无论是否经过译后编辑)相比是好是差,完全取决于读者的主观判断。但是有一天,技术本身将能够在这方面指导我们。
在这里,我们要谈到一种称为质量评估(QE)的方法,在该方法中,神经机器翻译开始可以评估其自己译文的质量。 不仅会给您提供机器生成的译文,机器还会变得聪明到可以告诉译者:这个译文不完善,但是我可以指出可能的错误的位置,并提供修复错误的可能的办法。 神经机器翻译不仅可以自我评估,还可以自我诊断。 随着时间的推移,有了足够的经验,甚至可以根据先前的选择修改一部分错误。
机器翻译完全意识到译文中的错误、并向编辑人员提供多种可行的修改方法,将是在某些用例中机器翻译会取代翻译记忆库的关键点。 而且,我们所展望的未来并不遥远。借助大型技术公司的大量投资,机器翻译可以比我们想象的更早做到这一点。
但是不要误解我们的意思,我们仍然可以肯定地说,机器无法取代人类译员,只是计算机辅助翻译可能转变为人工辅助计算机翻译。也就是说,机器将把第一关,至少针对感情权重较低的内容是如此,但最终可能也适用于感情权重较高的文本。然后,人类对机器译文进行判断和和处理。
然后,当然,我们必须考虑这种变化对翻译者意味着什么。如果机器先翻了一遍,人类更多的不是在翻译,而是在编辑。翻译人员会因为工作需求量减少而收入降低吗?那些翻译高度品牌化的营销内容的专业翻译员呢?使用机器翻译和翻译记忆库只会给他们带来麻烦,让他们没法进行创造性的翻译吗?我们能让这些人使用机器翻译吗?
回答这些问题,要回到我们还在使用模糊匹配法的原因上:因为翻译记忆库的译文还是人类译文。当机器翻译能够深入理解文本背后的含义和细微差别时,当机器翻译能够理解句子流并掌握跨语言的不同写作风格时,谁又知道它会将我们带向何方呢?
(沙龙君编译)
以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。
阅读原文