Moral Machines: Translation Suppliers and AI Ethics

道德机器:翻译供应商与人工智能伦理

2020-03-06 19:30 TAUS

本文共920个字,阅读需10分钟

阅读模式 切换至中文

How should the translation industry engage with the current conversation about ethical concerns in technology use? Here are some preliminary notes for an answer. Apart from genuine eco-concerns about the carbon footprint generated by machine learning applications, there seems to be little need to focus on AI explainability as part of a major CSR shift in corporate agendas. And any pre-AI ethical problems about translation practice itself have been largely addressed over the years. Yes, we believe in encouraging “moral machines” just as we believe in supporting moral actors in business generally. But is it necessary to introduce specific best practices to formalize this commitment? When addressing climate change, it is clear that machine learning for language-based tasks is particularly power-greedy. There has been widespread commentary on this issue, and recent Transformer models used to generate text from data have been singled out for their high energy cost. A further effect of building such models has been to restrict useful research on massive language data applications to only the very largest tech firms, which can afford such energy bills. This tends to exclude much-needed yet poorly-funded academic research. hbspt.cta._relativeUrls=true;hbspt.cta.load(2734675, 'd027041a-3b83-43c6-850b-be9e421b10bd', {}); At the same time, though, projects have been announced that aim to reduce the “learn” time, and hence the cost of operating these big language models. In the translation domain, could the emergence of smaller datasets also impact energy spend? The compute industry is now aware of the carbon cost of GPU/TPU usage. The longer-term question for us, though, will be how to address the global cost of rapid growth in machine use worldwide if we scale up to another 50 to 500 language pairs over the coming decades. Or start using less climate-friendly “massively multilingual” billion-word datasets to drive one-shot translation jobs! As Systran’s Jean Senellart said in a recent TAUS webinar, training one NMT model is equivalent to burning down a large tree. We should evolve towards more qualitative (and not quantitative) breakthroughs with the technology, and work together to share models instead of running the same processes and pairs again and again to achieve a minuscule advance in BLEU scores. We should also encourage LSPs to systematically choose to work with eco-friendly tech suppliers, and measure and raise awareness internally about carbon counts where necessary. As to fears about the ethical problem of technologically-driven bias or even “fake” experiences in translation, the response can be more sharply etched. The practice of AI-driven translation cannot by itself lead to “social” bias or intentional fakes, only to either accurate or inaccurate outputs. Any bias will, therefore, be carried over from the source text - it is exclusively the use made of a translation that can render its truth value “fake” or “genuine.” The whole industry is organized precisely to prevent clients from shooting the messengers, however, biased the messages they agree to process! That said, service suppliers might still be concerned about potential bias within the datasets used to build automated translation solutions. The response could be that (post)editing is basically tasked with removing any traces of unwanted “bias” generated by an unthinking machine. It would, of course, be interesting to know whether we can teach the technology to automatically isolate potential bias (in the “social” sense) from semantic error in the industry sense of mistranslation. Or more subtly, could translating something accurately unwittingly induce a sentiment of bias for a given native speaker? Going forward, the pursuit of translation accuracy may require social inclusiveness in certain cases to address the emerging norms of new language user communities. hbspt.cta._relativeUrls=true;hbspt.cta.load(2734675, '40ddb533-d3bc-44f3-a11d-29b8fb916247', {}); Similarly, as mentioned above, is there a risk of automatically-generated source text (e.g. via GPT-2 type solutions) entering the translation circuit unchecked, thereby increasing the likelihood of built-in bias? Before we answer these questions, we will need a clearer technical grasp of the details, plus better examples of high-risk cases. Finally, it might be considered an ethical practice for translation suppliers to prioritize the adoption of open-source technology solutions. This would mean preferring the services of tech suppliers that espouse OS to ensure that technology choice is not restricted to the products of just a few big players, and that further research and innovation is supported. And, as Jean Senellart also suggested, this could potentially encourage the monetization of data and models across an industry marketplace. There are other non-tech moral issues that could be raised in the industry in its conversations with both clients and end-users. It is surely preferable, for example, that a supplier should inform a client of potentially anti-inclusive or non-empathetic behavior due to not including a given language in a content package targeting a country or region - e.g. failing to include an order for a small-population local-language version of important medical or legal information. Ultimately, this sort of activism could evolve into a much broader “ethics & education” agenda, whereby suppliers try to ensure that translation-focused AI solutions are systematically adapted to under-served populations in general. However, this would mean a proactive step into unknown territory for the industry. All these topics clearly need richer, deeper and more informed debate. Tell us what you think! hbspt.cta._relativeUrls=true;hbspt.cta.load(2734675, 'd4244302-98aa-4b23-bd21-b9f18f1c0f42', {});
翻译行业应该如何参与当前关于技术使用中的伦理问题的对话?下面是一些初步的答案。 除了对机器学习应用程序产生的碳足迹的真正的生态担忧之外,似乎没有必要把重点放在人工智能的解释上,作为企业议程中 CSR 重大转变的一部分。过去几年里,任何关于翻译实践的人工智能前伦理问题都得到了很大的解决。是的,我们相信鼓励“道德机器”,就像我们一般支持商业道德行为体一样。但是,是否有必要引入具体的最佳做法来正式履行这一承诺? 在应对气候变化时,基于语言的任务的机器学习显然特别贪婪。对这个问题有广泛的评论,最近用来从数据生成文本的 Transformer 模型被挑出来,因为它们的高能量成本。建立这样的模型的另一个影响是将大规模语言数据应用的有用研究限制在只有最大的科技公司,因为这些公司能够支付这样的能源费用。这往往排除了急需但资金不足的学术研究。 hbspt 。cta 。_ relativeUrls = true ; hbspt 。cta 。load (2734675,' d 027041a-3b83-43c6-850b-be9e421b10bd ',{}); 不过,与此同时,已宣布了旨在减少“学习”时间的项目,从而降低运营这些大语言模型的成本。在翻译领域,小数据集的出现是否也会影响能源支出? 计算行业现在意识到 GPU / TPU 使用的碳成本。然而,对我们来说,更长期的问题是,如果在未来几十年中,我们再增加50到500个语言对,如何解决全球机器使用快速增长的成本。或者开始使用对气候不太友好的“大量多语言”十亿字数据集来驱动单样本翻译工作! 正如 Systran 的 Jean Senellart 在最近的一次 TAUS 网络研讨会上所说,训练一种 NMT 模型相当于烧毁一棵大树。我们应该朝着更多的定性(而非定量)突破的方向发展,并共同努力共享模型,而不是一次又一次地运行相同的过程和配对,以实现在 BLEU 分数方面的微小进步。我们还应鼓励 LSP 系统地选择与生态友好型技术供应商合作,并在必要时在内部衡量和提高对碳含量的认识。 对于技术驱动的偏倚,甚至是翻译中的“假”体验的伦理问题,人们的反应可能会更加尖锐。人工智能驱动的翻译实践本身不能导致“社会”偏见或故意伪造,只能导致准确或不准确的输出。因此,任何偏见都将从源文本中传递出来——它完全是用翻译来使其真值“假”或“真”的。整个行业的组织是精确的,以防止客户射击信使,然而,偏见的消息,他们同意处理! 尽管如此,服务供应商仍可能担心用于构建自动化翻译解决方案的数据集中的潜在偏差。回应可能是( post )编辑的任务基本上是删除任何不想要的“偏见”的痕迹,由一个不可想象的机器产生。当然,知道我们是否能教该技术自动地将潜在的偏差(从“社会”意义上讲)与工业误译的语义错误隔离起来,也是很有趣的。或者更微妙地说,翻译某些东西是否会在无意中给特定的母语人士带来偏见呢?展望未来,在某些情况下,追求翻译准确性可能需要社会包容,以解决新出现的语言用户社区规范。 hbspt 。cta 。_ relativeUrls = true ; hbspt 。cta 。load (2734675,'40ddb533-d3bc-44f3-a11d-29b8fb916247',{}); 同样,如上所述,是否存在自动生成的源文本(例如,通过 GPT-2类型的解决方案)未经检查进入翻译电路的风险,从而增加内置偏差的可能性?在回答这些问题之前,我们需要对细节有一个更清晰的技术理解,以及更好的高风险案例。 最后,翻译供应商优先采用开源技术解决方案可能被视为一种道德做法。这将意味着更倾向于支持操作系统的技术供应商的服务,以确保技术选择不仅限于少数大公司的产品,而且支持进一步的研究和创新。正如 Jean Senellart 所指出的,这可能会鼓励整个行业市场的数据和模型货币化。 在该行业与客户和最终用户的对话中,还可以提出其他非技术道德问题。例如,供应商应该将潜在的反包容性或非同情心行为告知客户,因为不在针对一个国家或地区的内容包中包括给定语言,例如不包括对重要医疗或法律信息的小群体本地语言版本的订单。 最终,这种行动主义可能演变成更广泛的“道德与教育”议程,供应商试图确保以翻译为重点的人工智能解决方案系统地适应总体服务不足的人群。然而,这将意味着该行业进入未知领域的积极步骤。 所有这些主题显然需要更丰富、更深入和更知情的辩论。告诉我们你的想法! hbspt 。cta 。_ relativeUrls = true ; hbspt 。cta 。load (2734675,' d 4244302-98aa-4b23-bd21-b9f18f1c0f42',{});

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文