If your business, like many others in this digital age, has growing content volumes and needs faster and faster turnaround times, yet your budget for localization is not increasing, then you might be considering machine translation (MT).
In order to help enterprises start their journey with MT, we recently held a webinar in which Adam LaMontagne, MT Program Manager at RWS Moravia, reviewed the history of MT, its enterprise use cases and how to get started. Here is a recap of what he presented.
A bit of history
Machine translation (MT), one of the oldest sub-fields of artificial intelligence (going back to the 1950's), uses software to translate texts from one language to another. When they first introduced the technology, scientists were confident that any kinks would be ironed out within a few years. However, some 70 years later, MT technology is still very much an ongoing experiment with some notable advances and innovations in recent years, but also much headway still left to be made.
One of the most significant changes to the world of MT was the shift from rule-based to statistical models. Rule-based MT, which entails developing linguistic rules to translate from one language to another, is still in use today in some applications. However, since the late 1980's, most MT applications have used statistical models. On top of bringing costs down, statistical MT takes greater advantage of modern CPU capabilities and can enable economies of scale in that, contrary to rule-based models, the same algorithm can be used to train many language pairs. That being said, statistical models have a certain ceiling in terms of quality, most notably in fluency.
A newer sub-category of statistical models is neural MT (NMT), which is built on the same fundamental concepts but mimics the brain's neural systems in its design. NMT harnesses the increased processing power of modern computers to offer significant improvements in translation quality. NMT models can use deep learning techniques to produce both faster and better-quality translations compared to traditional statistical models. At present, they represent the state of the art of enterprise MT and are in use by giants like Google and Microsoft. It should be noted that NMT comes with corollaries in cost due to the increased computational power it demands.
In what enterprise applications is MT useful?
Generally speaking, MT makes the most sense in translation programs with high content volumes, in terms of justifying the ROI. This can be particularly true with content that is useful but not high enough of a priority to justify the cost of full human translation, such as customer feedback. MT can also be particularly beneficial in contexts where speed is a crucial factor, costs need to be cut or a fixed budget is running up against increasing translation needs.
Another scenario in which MT can be a great choice is when a self-service solution is needed to facilitate communication among employees and/or users across a language barrier, such as in a community forum for a multinational company. And a final MT application worth mentioning is as an integrated component of a larger service workflow, such as in multilingual sentiment analysis or speech-to-speech translation.
All said, MT is now being used in all sorts of scenarios, even for marketing material, which has formerly been thought to be too complicated for a machine to handle because it is highly branded and its meaning is nuanced.
Where is MT advantageous?
The first advantage MT can offer enterprises in these cases is greater productivity, which can drive faster time-to-market, help maintain budget and handle growing volumes. Particularly in larger multinational organizations with vast amounts of content to be translated, a single robust MT system can serve multiple purposes, providing translation of anything from internal communications to blog posts and community forums. And, a properly deployed MT system can avoid the typos or misspellings that can easily evade the human eye, thereby achieving greater consistency across the board.
What factors should be considered when choosing an MT service?
Quality requirements
Obviously, MT can work in a variety of contexts, and translation quality can vary just as widely as the contexts themselves. Ultimately, quality will increase in direct proportion to the calibre and volume of training data input into the system. Moreover, the quality, consistency and complexity of the source content will have a direct effect on translation quality. For example, slang or acronyms not included in the training data can lead to poorer translation results.
It is important to weigh budget and time considerations alongside the need for quality and nuance. For example, if you only need to convey the basic idea of a text, your MT solution will likely be simpler and your specified quality level easier to achieve. On the other hand, where high emotional impact is required, such as in sales and marketing content, a more specialized engine would be needed, in which case, achieving the level of nuance necessary may be much more of a challenge.
Language differences
Another factor that is important to keep in mind is that not all languages or language pairs are created equal. For instance, while Dutch-to-English translation and vice versa generally produce good results thanks to these languages’ similarities, Chinese-to-English translation is a much more complex task due to the vast differences in syntax, morphology and logic between the languages.
Generic vs. customized MT
MT can be deployed in several different ways, depending on both needs and constraints. For instance, generic engines can be used fairly straightforwardly and with lower deployment costs, but typically with a corresponding effect on quality. Customizing an engine to a specific application will obviously improve results, but the price tag can vary greatly depending on the specific languages and content involved. In many cases, however, much of the cost of customization is upfront rather than ongoing. As noted previously, in terms of ROI, customized MT generally makes the most sense for high content volumes.
Post-editing
MT can be used with differing levels of post-editing, the process by which human translators/editors review MT translations. This phase can be particularly helpful in identifying terminology the MT engine may have failed to recognize or render correctly, which can then be fixed by the post-editor. Plus, the engine can be retrained to improve future output and reduce post-editing effort. Depending on quality requirements and the corresponding scale of post-editing in the workflow, this phase can obviously involve varying levels of investment of both time and money.
Security
Finally, it bears mentioning that not all MT providers offer the same level of security for data that passes through MT engines. Free MT services such as Google Translate, for instance, store and use data uploaded to them in order to train their engines. One of the advantages, therefore, of a paid MT service is that security needs can be clarified in the discovery phase and ensured through direct agreements with the MT service provider.
What does MT deployment look like?
While each case is unique, MT deployment generally consists of the following:
1. Discovery
This initial stage serves to analyse existing processes, clarify goals such as languages and content types to be translated and identify quality requirements.
2. Pilot
In this stage, an appropriate engine is selected based on discovery findings and, in the case of a customized engine, training of the engine begins.
3. Testing
The selected engine gets tested and evaluated through automatic quality assessments, human review or both. The testing phase is meant to produce results that can indicate whether quality goals are being met or whether results should be further optimized.
4. Engine improvement
If the test phase reflects a need to improve the results being generated, the MT system can be fed additional data and trained further.
5. Deployment
In this stage, the MT engine gets integrated into the workflow, either by connecting to CAT (computer-assisted translation) or TMS (translation management system) tools or as a stand-alone application.
6. Maintenance
The MT engine is monitored and retrained over time as content evolves. This is particularly important as new products or services are introduced, in cases of re-branding or when a company wants to change its tone.
Conclusion: successful MT deployment requires careful consideration
As you can see, there are quite a few factors to consider when designing and deploying an MT system. Depending on the languages and content involved, the required level of quality and budget and time considerations, different solutions will make sense in different contexts. Therefore, it is always wise to work with a professional localization firm well-versed in working with MT solutions. They will help you assess your needs, prioritize your considerations of cost, time and quality and deploy an MT solution that will get you the results you need.
At RWS Moravia, we pride ourselves as leaders in the field, and it is our passion to work with companies to identify and implement the right MT technology for their specific goals.
If this summary has piqued your interest, or if you want to hear more on the subject, you can listen to the full webinar on-demand.
如果您的企业(与这个数字时代的许多其他企业一样)的内容量不断增长,并且需要更快的周转时间,但是本地化预算没有增加,那么您可能正在考虑机器翻译(MT)。
为了帮助企业开始使用MT,我们最近举行了一个网络研讨会,RWS Moravia的MT程序经理Adam LaMontagne回顾了MT的历史,其企业用例以及入门方法。 以下是他介绍的内容的摘要。
一部分历史
机器翻译(MT)是人工智能最古老的子领域之一(可追溯到1950年代),它使用软件将文本从一种语言翻译成另一种语言。 当他们首次引入该技术时,科学家们相信,几年之内就可以解决所有的问题。 然而,大约70年后,机器翻译技术仍是一项持续不断的实验,近年来取得了一些显着的进步和创新,但仍有许多进展有待取得。
MT领域最重大的变化之一就是从基于规则的模型向统计模型的转变。 基于规则的MT要求开发语言规则以从一种语言翻译成另一种语言,如今仍在某些应用程序中使用。 但是,自1980年代末以来,大多数MT应用程序都使用统计模型。 除了降低成本外,统计MT还可以利用现代CPU功能的更多优势,并且可以实现规模经济,因为与基于规则的模型相反,该算法可以用于训练许多语言对。 话虽这么说,统计模型在质量上有一定的上限,尤其是在流利度上。
统计模型的一个较新的子类别是神经MT(NMT),它基于相同的基本概念,但在设计时模仿了大脑的神经系统。 NMT利用现代计算机不断增强的处理能力来显着提高翻译质量。 与传统的统计模型相比,NMT模型可以使用深度学习技术来生成更快,质量更高的翻译。 目前,它们代表了企业MT的最新状态,并已被Google和Microsoft等巨头使用。 应该注意的是,由于NMT需要增加计算能力,因此NMT伴随着成本的增加。
MT在哪些企业应用程序中有用?
一般来说,就证明投资回报率而言,MT在具有高内容量的翻译程序中最有意义。 对于有用但优先级不够高的内容来证明完整人工翻译的成本(例如客户反馈)是正确的。 在速度是关键因素,需要削减成本或固定预算以应对日益增长的翻译需求的情况下,机器翻译也特别有利。
MT可能是绝佳选择的另一种情况是,当需要自助服务解决方案来促进跨语言障碍的员工和/或用户之间的交流时,例如在跨国公司的社区论坛中。 值得一提的最终MT应用程序是大型服务工作流的集成组件,例如在多语言情感分析或语音到语音翻译中。
总而言之,MT现在被用于各种场景中,甚至用于营销材料,以前人们认为MT的商标很高且含义细微,因此对于机器来说太难操作了。
MT在哪里有优势?
在这些情况下,MT可以为企业提供的第一个优势是更高的生产率,这可以缩短上市时间,帮助维持预算并应对不断增长的数量。 特别是在具有大量要翻译内容的大型跨国公司中,一个强大的机器翻译系统可以满足多种目的,提供从内部通信到博客文章和社区论坛的任何内容的翻译。 而且,正确部署的MT系统可以避免容易避开人眼的错别字或拼写错误,从而在整体上实现更高的一致性。
选择MT服务时应考虑哪些因素?
质量要求
显然,MT可以在多种语境中工作,并且翻译质量可也会因语境而异。 最终,质量将与输入到系统中的训练数据的质量和数量成正比。 而且,源内容的质量,一致性和复杂性将直接影响翻译质量。 例如,训练数据中未包含的语或首字母缩写词可能会导致翻译效果较差。
权衡预算和时间因素以及对质量和细微差别的需求非常重要。 例如,如果您只需要传达文本的基本概念,则您的MT解决方案可能会更简单,并且更容易达到指定的质量水平。 另一方面,在需要很高的情感影响的地方,例如在销售和营销内容方面,将需要更专业的引擎,在这种情况下,实现必要的细微差别可能是更大的挑战。
语言差异
要记住的另一个重要因素是,并非所有语言或语言对都是一样的。 例如,虽然荷兰语到英语的翻译和反之亦然通常会产生良好的效果,但是由于这些语言的相似性,中文翻译到英语的翻译要复杂得多,因为这两种语言之间的语法,形态和逻辑差异很大 。
通用MT与自定义MT
MT可以根据需要和约束以几种不同的方式进行部署。 例如,通用引擎可以相当直接地使用并且具有较低的部署成本,但是通常会对质量产生相应的影响。 为特定应用程序定制引擎显然会改善结果,但是价格标签可能会大大不同,具体取决于所涉及的特定语言和内容。 但是,在许多情况下,自定义的大部分成本是前期而不是持续的。 如前所述,就ROI而言,定制的MT通常对于高内容量最为有意义。
后期编辑
MT可以用于不同级别的后期编辑,即人工翻译/编辑者审阅MT翻译的过程。 此阶段在识别MT引擎可能无法正确识别或正确呈现的术语时特别有用,然后可由后期编辑器对其进行修复。 另外,可以对引擎进行重新培训,以提高将来的输出并减少后期编辑工作。 根据质量要求和工作流中相应的后期编辑规模,此阶段显然可以涉及不同程度的时间和金钱投资。
安全性
最后,值得一提的是,并非所有MT提供商都为通过MT引擎传递的数据提供相同级别的安全性。 例如,免费的MT服务(例如Google Translate)可以存储和使用上传给他们的数据来训练其引擎。 因此,付费MT服务的优点之一是可以在发现阶段阐明安全需求,并通过与MT服务提供商直接达成协议来确保安全需求。
MT部署是什么样子的?
尽管每种情况都是唯一的,但MT部署通常包括以下内容:
1.发现
此初始阶段用于分析现有流程,阐明目标(例如要翻译的语言和内容类型)并确定质量要求。
2.试点
在此阶段,根据发现结果选择合适的引擎,如果是定制引擎,则开始训练引擎。
3.测试
选定的引擎将通过自动质量评估,人工审核或两者进行测试和评估。 测试阶段旨在产生可以指示是否达到质量目标或是否应该进一步优化结果的结果。
4.引擎改进
如果测试阶段反映出需要改善产生的结果,则可以向MT系统提供其他数据并进行进一步培训。
5.部署方式
在此阶段,通过连接到CAT(计算机辅助翻译)或TMS(翻译管理系统)工具,或作为独立应用程序,MT引擎已集成到工作流中。
6.维修
随着内容的发展,会随着时间的推移监视MT引擎并对其进行重新培训。 这在引入新产品或服务,更名商标或公司要更改其语气时特别重要。
结论:成功的MT部署需要仔细考虑
如您所见,在设计和部署MT系统时要考虑很多因素。 根据所涉及的语言和内容,所需的质量,预算水平和时间方面的考虑,在不同的情况下将有不同的解决方案。 因此,与精通MT解决方案的专业本地化公司合作总是明智的。 他们将帮助您评估需求,优先考虑成本,时间和质量,并部署MT解决方案以获取所需的结果。
在RWS Moravia,我们以自己在该领域的领导者而自豪,我们很高兴与公司合作,为他们的特定目标确定并实施正确的MT技术。
如果此摘要引起了您的兴趣,或者您想了解更多有关此主题的信息,则可以按需收听完整的网络研讨会。
以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。
阅读原文