Focusing on BLEU Can Bias Machine Translation Output

专注于BLEU Can偏差机器翻译输出

2021-01-14 18:00 slator

本文共370个字,阅读需4分钟

阅读模式 切换至中文

A recent paper by top machine translation (MT) researchers concluded that beam search, a very effective way to maximize BLEU scores, can lead to a high rate of misgendered pronouns. The November 2020 paper, Decoding and Diversity in Machine Translation, is a collaboration between Graham Neubig, Nicholas Roberts, and Zachary C. Lipton at Carnegie Mellon University and Amazon machine learning scientist Davis Liang. The authors opened by describing the two basic stages of the MT process. In the first, the “modeling” stage, researchers train a conditional language model using neural networks; in the second, the “search” stage, the model searches for the “best” translation using either “greedy decoding” or a beam search to produce predictions. Beam search, in particular, is very effective at maximizing BLEU scores, “but there is a significant cost to be paid in naturalness and diversity,” the researchers wrote. In practice, this means that MT models typically offer no variability in translations, leading to less engaging output. The researchers also suggested that readers who encounter a given language primarily through these more monotonous translations “might develop a warped exposure to that language.” Gender pronouns were just one of a number of diversity diagnostics the team introduced in their experiments, but researchers found that even when translating between two gendered languages, search disproportionately chose the more frequent gender, based on the input. For English to German translations, researchers noted that since the German word “sie” translates as “she,” “they,” or “you” in English, the result was a bias toward the more common gender pronoun, “sie.” By contrast, when translating from French or German to English, male pronouns were more represented in the training set, and the bias skewed male accordingly. A possible alternative to search might be sampling, which has lower rates of replacing “she” and “her” with male pronouns compared to search. However, the authors warned, the field might not be ready to shift away from search just yet, since sampling does not yield the same consistently high BLEU scores that search does. “The singular focus on improving BLEU leaves no incentive to address issues of diversity,” they wrote. The researchers’ own future work will explore techniques that can achieve high BLEU scores while producing natural-sounding translations.
顶级机器翻译(MT)研究人员最近发表的一篇论文得出结论,光束搜索是最大化BLEU分数的一种非常有效的方法,但它会导致代词性别错错的比率很高。 2020年11月的论文《机器翻译中的解码和多样性》是Graham Neubig、Nicholas Roberts和Zachary C的合作。卡内基梅隆大学的利普顿和亚马逊机器学习科学家Davis Liang。 作者首先描述了MT过程的两个基本阶段。在第一个“建模”阶段,研究人员使用神经网络训练一个条件语言模型;在第二阶段,“搜索”阶段,模型使用“贪婪解码”或波束搜索来搜索“最佳”翻译以产生预测。 研究人员写道,特别是光束搜索在最大化BLEU分数方面非常有效,“但在自然性和多样性方面需要付出巨大的代价。”在实践中,这意味着MT模型通常不提供翻译的可变性,导致产出不那么吸引人。研究人员还建议,主要通过这些更单调的翻译接触特定语言的读者“可能会扭曲地接触该语言”。 性别代词只是团队在实验中引入的一些多样性诊断之一,但研究人员发现,即使在两种性别语言之间进行翻译时,搜索也会根据输入不成比例地选择更频繁的性别。 对于英语到德语的翻译,研究人员指出,由于德语单词“sie”在英语中翻译为“she”、“they”或“you”,其结果是偏向了更常见的性别代词“sie”。相比之下,当从法语或德语翻译成英语时,男性代词在训练集中得到更多代表性,而偏见相应地扭曲了男性。 搜索的一个可能替代方案可能是抽样,与搜索相比,用男性代词替换“她”和“她”的比率较低。然而,作者警告说,该领域可能还没有准备好放弃搜索,因为抽样不会产生与搜索相同的高BLEU分数。 他们写道:“对改善BLEU的单一关注没有留下解决多样性问题的动力。”研究人员自己的未来工作将探索在制作自然翻译的同时实现高BLEU分数的技术。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文