Do Massively Multilingual Machine Translation Models Help Business Functions Get Massively Multilingual Faster?

大规模多语言机器翻译模型是否有助于业务功能更快地大规模多语言化?

2020-11-26 11:00 sdltrados

本文共811个字,阅读需9分钟

阅读模式 切换至中文

What is Massively Multilingual Translation? Typically, translation models are trained for one language pair at a time to ensure the best accuracy and fluency; this is the prevalent machine translation methodology in the market today. Massively Multilingual Translation uses a large amount of training data across many languages to produce a single model that can be applied to any language pair. Those that are working with this claim that the results are just as good if not better than the bilingual baseline approach, especially for low resource languages. The theory is that the method is able to apply learnings across languages rather than being focused on a single language pair. While this is an interesting development and may with time open up new possibilities, it is not a methodology that would materially improve the outcome for enterprise machine adoption. And there are some drawbacks to this methodology that may not be readily apparent.Massively Multilingual Translation (MMT) begins with a premise that there are vast amounts of training data for any and all language pair combinations. That is, there is just as much content to train models that can instantly translate from Uzbek to Hebrew as there would be to train French to English (or English to French). That may be true if the content you need to translate is common—chats about food, entertainment, current events, short bits of information – the content that would be relevant and useful to a Facebook user. However, that same approach may not work if the body of content you need to translate is technically complex—contracts, patents, documentation, corporate policies. While the researchers note that training high and low resource languages together help the model benefit from what can be describes as economies of scale—that may not actually apply to language nuances. The vernacular and technical vocabulary may be vastly different and mixing the languages may produce less useful results. Why Massively Multilingual Translation isn't the answer for enterprise environments MMT proponents note that the method has a positive impact on BLEU scores. For low resource languages—languages that aren’t frequently translated as a pair and don’t have a lot of unique training content—a five point improvement in BLEU scores isn’t unusual. For a consumer application, that may be a significant achievement since the accuracy bar may be fairly low. For a business application, a BLEU score isn’t enough to capture the ROI and a five point improvement may not be worth the added complexity of implementation that would be introduced with a single model for all, versus expertly trained bilingual models.For example:A single model makes it harder to isolate issues in individual language pairs and adjust them without impacting all operations. A language pair-specific approach means that errors and re-training can focus on one model without affecting the rest of the business. This also allows customers to test changes on isolated languages and ensure success before changing others.Many enterprise customers require custom-trained models, in order to ensure that the MT engine preserves their brand voice and their translation guidelines. (This is why SDL introduced Adaptable Language Pairs in June 2019). Adaptable Language Pairs create customer models by taking the generic model and then using customer data to adapt it. If the generic model is an MMT, adaptation becomes much more difficult and computationally expensive.One of the promises of AI is that of self-learning systems. For Machine Translation, this indicates the ability of a translation engine to assimilate feedback, and immediately change its behavior in ways that make it more consistent with that feedback. SDL introduced this feature in SDL Machine Translation Edge 8.5. Such real-time adaptation from relatively little feedback is much more challenging to obtain for a large MMT model with billions of parameters.The main issue that MMT solves is breadth of languages. Training languages one pair at a time can appear daunting if the goal is to achieve the breadth equal to what MMT can achieve with a single run. However, there are ways to achieve breadth that addresses low resource languages without resorting to MMT. Get Involved Language pair chaining is a method whereby language pairs are chained together to add breadth. SDL implemented language pair chaining in its SDL Machine Translation Edge deployment in January 2020. Customers are able to chain language pairs together and deploy new models easily. This same capability is now part of SDL Machine Translation Cloud where users are able to automatically apply chained models to achieve a breadth of language pairs well beyond 2,000 combinations. There is a 14 day free trial available for SDL Machine Translation Cloud where those interested can try this, and other features included in our award-winning neural machine translation software. You can learn more about SDL Machine Translation, and sign-up for the free trial, here.
什么是大规模多语言翻译? 通常,翻译模型一次针对一种语言对进行训练,以确保最佳的准确性和流畅性;这是当今市场上流行的机器翻译方法。大规模多语言翻译使用跨多种语言的大量训练数据来产生可应用于任何语言对的单一模型。那些从事这项工作的人声称,结果即使不比双语基线方法更好,也是一样好的,特别是对于低资源语言。其理论是,该方法能够应用跨语言的学习,而不是专注于单一的语言对。虽然这是一个有趣的发展,并可能随着时间的推移而开辟新的可能性,但它并不是一种方法,能够实质性地改善企业机器采用的结果。大规模多语言翻译(Massively Multilingaluage Translation,MMT)的一个前提是,对于任何和所有的语言对组合,都有大量的训练数据。也就是说,训练可以从乌兹别克语即刻翻译成希伯来语的模型的内容和训练法语到英语(或英语到法语)的内容一样多。如果你需要翻译的内容是普通的--关于食物的聊天,娱乐,时事,小信息--这些内容对Facebook用户来说是相关的,有用的。然而,如果您需要翻译的内容在技术上很复杂--合同,专利,文档,公司政策--那么同样的方法可能行不通。研究人员注意到,同时训练高资源和低资源语言有助于模型从所谓的规模经济中获益--但实际上这并不适用于语言的细微差别。当地语言和技术词汇可能有很大的不同,混合语言可能产生不太有用的结果。 为什么大规模的多语言翻译不能解决企业环境的问题 MMT的支持者指出,该方法对BLEU评分有积极影响。对于低资源语言--这些语言不经常被翻译成一对,也没有很多独特的训练内容--BLEU分数提高5分并不罕见。对于消费者应用程序来说,这可能是一个重要的成就,因为精度条可能相当低。对于一个业务应用程序,BLEU分数不足以捕获ROI,5分的改进可能不值得增加实现的复杂性,与经过专业训练的双语模型相比,单一模型会引入所有的单一模型。例如:单一模型使得在不影响所有操作的情况下更难隔离单个语言对中的问题并调整它们。特定于语言对的方法意味着错误和重新训练可以集中在一个模型上,而不会影响业务的其余部分。这也允许客户测试孤立语言的更改,并在更改其他语言之前确保成功。许多企业客户需要定制训练的模型,以确保MT引擎保留他们的品牌声音和翻译指南。(这就是SDL在2019年6月推出可适应语言对的原因)。自适应语言对通过采用通用模型,然后使用客户数据对其进行自适应来创建客户模型。如果通用模型是一个MMT,那么自适应变得更加困难,计算成本也更高。对于机器翻译来说,这表明翻译引擎有能力吸收反馈,并立即改变其行为,使之与反馈更加一致。SDL在SDL机器翻译Edge 8.5中引入了这一特性。对于一个具有数十亿参数的大型MMT模型来说,从相对较少的反馈中获得这种实时自适应是非常具有挑战性的,MMT解决的主要问题是语言的广度。如果一次训练一对语言的目标是达到MMT一次就能达到的广度,那么一次训练一对语言可能会让人望而生畏。然而,有一些方法可以在不使用MMT的情况下实现低资源语言的广度。 参与进来 语言对链接是一种将语言对链接在一起以增加广度的方法。SDL于2020年1月在其SDL机器翻译边缘部署中实现了语言对链接。客户能够将语言对链接在一起并轻松部署新模型。同样的功能现在也是SDL机器翻译云的一部分,用户可以自动应用链式模型来实现超过2000种组合的语言对。有一个14天的免费试用SDL机器翻译云,有兴趣的人可以尝试这个,以及我们获奖的神经机器翻译软件所包含的其他功能。您可以了解更多关于SDL机器翻译,并注册免费试用,在这里。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文