A New Age for Word Alignment in Machine Translation

机器翻译的词对齐技术进入新时代

2020-07-10 03:40 Lilt

本文共841个字,阅读需9分钟

阅读模式 切换至中文

Here at Lilt, we have a team full of exceptionally smart and talented individuals that are working hard to solve the translation and localization industries’ toughest challenges. We’re always researching new ways to improve the day-to-day lives of localization leaders and translators alike.  That’s why we’re excited to share that Lilt’s own Thomas Zenkel, Joern Wuebker, and John DeNero have published a paper entitled “End-to-End Neural Word Alignment Outperforms GIZA++” in which they describe their purely neural word alignment system, which provides a 13.8% relative improvement in Alignment Error Rate. Here’s a quick recap of the paper and why it’s important for the machine translation community. Using AI to translate text from a source language to a target language can be an extremely challenging task, given all of the contextual information required. While machine translation models the patterns to translate between source and target sentences, word alignment models the relationship between the words of the source sentence with a target sentence. In this paper, we compare our new, purely neural word alignment system to the current industry standard statistical system GIZA++.  When translating digital content, linguists have to translate more than just the text on the page. Formatting, for example, is a commonly used and important aspect of online content that is typically managed with tags, such as bold and italics. When linguists work, they need to ensure these tags are placed accurately as part of the translation. Unfortunately, if the word alignment is inaccurate, it makes placing the formatting tags very difficult.  This image shows an attention matrix - it visualizes how the words of a source sentence and its translation correspond to each other. In the above example, wir in German matches to we in English, while nicht matches to not. This matrix was put together by a human translator, but for machine translation, this task can be much harder to do without any annotated training data.  For example, if the content requires that the above sentence in English - “We do not believe” - is bolded yet the word alignment is incorrect, the output in German will include formatted tags that are likely in the wrong position. That means that the incorrect text will be bolded and the linguist will need to go and manually change the tags. The current state-of-the-art statistical word alignment system is called GIZA++, and it’s been the leader since 2000. These word alignment systems focus on Alignment Error Rate, commonly known as AER,which compares the alignment links of an automatic system to those produced by a human annotator. The Lilt team used a German > English data set to annotate their approach. Using that data, GIZA++ had an AER of 18.9%. To see how they could improve on that score, Zenkel and team used the most commonly used machine translation system (a transformer) and visualized its attention activations. Extracting alignments based on these yields a high AER of 66.5%. However, instead of only showing black and white alignments, the matrix above now uses a grey scale - more black means more correspondence between words, more white means less correspondence.    However, by adding in new steps that focus on word alignment quality instead of translation quality, they were able to reduce the AER to 34.5%. They then built systems that reward relationships between neighboring words. For example, the neighboring words dass wir and that we correspond to each other. While a grey pattern has emerged, the team wanted to increase the attention in the highlighted area above. The result? A more concentrated attention matrix, one that more accurately predicts the relationships between words. That step reduced the AER down to 27.8% - closer to that of GIZA++.  Perhaps the biggest change, however, came when the team started to think beyond just translating from German to English. If you compare the German to English model (forward model) to the English to German model (backward model), the alignments should be very similar - if not identical.  By forcing the system to use the same word alignment (or attention matrix) for both the forward and backward models, it can better understand and predict the relationships between words. Remarkably, this dropped the AER to 17.9%, dipping below the impressive GIZA++ score of 18.9%. With one more tweak, the team was able to improve the Alignment Error Rate to 16.3%, showing just how successful the word alignments were. Ultimately, the team’s contributions to word alignment approaches have outperformed even the most widely adopted, state-of-the-art system available. Tests in other languages (English-French and Romanian-French) have confirmed that this approach outperforms competitive statistical alignment systems, resulting in a lower AER than its previous models. Improved word alignments increase the quality for automatically transferring formatting information into the translation.  We’re extremely proud of the amazing work that Zenkel, Wuebker, and DeNero have accomplished, and we’re excited to see what the experts on the Lilt team are able to come up with next.  Interested in learning more about Lilt’s human-in-the-loop machine translation?
在Lilt,我们有一个非常聪明且富有才华的团队,他们正在努力解决翻译和本地化行业中最棘手的挑战。我们一直在努力研究新的方法,来改善本地化项目负责人和翻译人员的日常工作。 因此,我们在此高兴地分享来自 Lilt 的 Thomas Zenkel、Joern Wuebker 和 John DeNero 联合发表的“End-to-End Neural Word Alignment Outperforms GIZA++(端到端神经词对齐优于Giza++)”一文。他们在论文中描述了一个纯神经词对齐系统,该系统在对齐错误率方面带来了13.8%的改进。下面是这篇论文的简要介绍,及其对机器翻译社区的重要性。 使用 AI 技术将文本从源语言翻译成目标语言,是一项极具挑战的事情,因为要考虑文本的所有语境信息。机器翻译需要对源句子和目标句子之间的翻译模式进行建模,而词对齐则对源句子和目标句子中的词之间的关系进行建模。 在本文中,我们将用我们新的纯神经词对齐系统与当前行业标准的统计型系统 GIZA++进行比较。 在翻译数字化格式的内容时,翻译人员不仅仅要翻译页面上的文字,还要处理好其他方面,比如,在线内容通常使用标记符号管理文本的格式。例如,粗体,斜体。当翻译人员工作时,他们需要确保这些标记也作为译文的一部分,放到正确的位置。不幸的是,如果词对齐不准确,则如何处理好格式标记的位置就变得非常困难。 此图展示了一个注意力矩阵 -- 它形象地展示了源句子中的单词是如何与它们的译文进行对应的。在上例中,德语中的wir与英语中的we匹配,而nicht与not匹配。这个矩阵是由一个人类翻译人员绘制的。但是对于机器翻译来说,如果没有任何标注的训练数据,那这个任务就要困难得多。 例如,如果网页内容要求上面的英语句子“we do not believe”为粗体,而词对齐不正确,则德语的输出中的格式标记可能在错误位置。这意味着加粗的内容不对。翻译人员将需要手动更改标记。 目前最先进的统计型词对齐系统被称为Giza++,它自2000年以来一直处于领先地位。这些词对齐系统的关注点是对齐错误率,通常称为AER,它将系统自动生成的对齐链接与人类标注者产生的对齐链接进行比较。 Lilt 团队使用德语 > 英语数据集来阐释他们的方法。使用该数据,Giza++ 的AER为 18.9%。为了展示他们如何在这方面有所改进,Zenkel 和团队使用了最常用的机器翻译系统(transformer),并将其注意力激活可视化。基于此,对齐结果的AER 高达 66.5%。 不过,上面的矩阵不再只是显示黑白对齐,而是使用了灰度 -- 越黑意味着单词之间的对应度越高,越白意味着对应度越低。 然而,通过增加新的步骤,重点关注词对齐质量而不是翻译质量,他们能够将 AER 降低到 34.5%。 随后,他们建立了用于标记相邻词之间关系的机制。例如,相邻的单词dass wir和that we相互对应。虽然出现了一个灰色模式,但团队希望增加对上面突出显示区域的关注。 结果如何呢? 一个更集中的注意力矩阵,一个更准确地预测单词之间关系的矩阵。 这一步将 AER 降至 27.8%,接近Giza++的水平。 然而,也许最大的变化是当团队开始思考不仅仅是从德语翻译成英语的时候。如果将德语-英语模型(正向模型)与英语-德语模型(反向模型)进行比较,则对齐应该非常相似 -- 即使不完全相同的话。 通过强迫系统对正向模型和反向模型都使用相同的词对齐(或注意矩阵),它可以更好地理解和预测单词之间的关系。引人注目的是,这将AER降至17.9%,低于GIZA++的得分18.9%。再做一次调整,团队还能将对齐错误率改进到 16.3%,这充分表明了词对齐的成功。 团队在词对齐方法方面的努力,终于使其超过了最广泛采用的、最先进的系统。 其他语言(英语-法语和罗马尼亚语-法语)的测试证实,这种方法优于对照组的统计型对齐系统,AER 比之前的模型低。通过使用改进的词对齐方法,可以大大提升系统自动将文本格式信息迁移到译文中的质量。 我们为 Zenkel、Wuebker 和 DeNero 所带来的出色成果感到无比自豪,我们期待 Lilt 团队中的专家们再接再厉。 有兴趣了解更多关于 Lilt 的 human-in-the-loop 机器翻译吗?

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文