Disruptive So Far: A Look Back at Neural Machine Translation in the Age of Natural Language Processing

迄今为止依然颠覆:自然语言处理时代的神经机器翻译回顾

2020-10-17 05:30 CSOFT

本文共585个字,阅读需6分钟

阅读模式 切换至中文

With news this week that Google has improved some of the key linguistic functionalities of its search engine’s machine learning algorithms – including an AI spell check improvement that Google’s head of search considers more important by itself than the previous five years’ progress – it is an interesting moment in the brief history of language-related AI to reflect on how things have advanced within the language services industry. Much has changed in the few years since neural machine translation (NMT) first made waves with its potential to revolutionize localization, even in the very nature of the questions people continue to ask about it. Three years ago, for instance, one significant concern was over how neural MT could impact global language diversity. While concerns for technology’s unintended consequences for minorities are still very much in circulation, the question of AI’s inclusivity within language groups is now the more discussed ethical issue, as algorithms are now known to generate statistical in-groups and out-groups along demographic lines. The fact that this has not manifest in crises specific to the translation industry is partly indicative of the relatively moderate rollout of these capabilities, as well as just how much of a performance gap remains between raw machine translation and human-inclusive models even several years onward. The sheer diversity of populations now interacting in the globalized world economy may be a greater challenge to NMT’s effectiveness than NMT is to global diversity, as the limitations of a machine-centric model for translation have emerged in clearer focus. In terms of performance, one of the major things that has changed for the better is our understanding of how NMT works, and how to improve on raw machine translation. As during its infancy, NMT today requires intensive work from human linguists not only to iron out any linguistic flaws that may arise, but also to verify that the model itself is performing correctly. Whereas plain translation is about expressing something in the correct conjugations and declensions, machine translation post-editing (MTPE) is also about verifying the mechanism behind these choices, and is arguably the harder task. To reduce the burden for human linguists and engineers, focus has shifted toward the crucial element of MT training, whereby neural translation models are prepared on linguistic datasets known to contain accurate translations in relevant subject matter areas. As NMT practices continue advancing to steady enthusiasm across our industry, it is worth bearing in mind that natural language processing (NLP) capabilities now entering a ‘golden era’ will at some point enter practical applications specific to localization. When this happens, a paradigm shift will likely come underway as human linguists become less important to quality assurance.  However, we may first see a significant uptick in the market for machine translation solutions as the world economy continues to weather crisis. As NMT is first and foremost a driver of cost-effectiveness and efficiency for high volume translations, it is often the preferable method when budget concerns are decisive. NMT may not look quite as cutting-edge as it once did, but it is a more mature technology with an established role in localization strategy. With a global network of linguists, subject matter experts, and engineers trained in the latest best practices for machine translation and linguistic review, CSOFT International can help companies realize cost-effective solutions meeting all of their translation requirements for entering new markets. You can learn more about our translation technologies and MTPE services at csoftintl.com!
本周有消息称,谷歌改进了其搜索引擎机器学习算法的一些关键语言功能,其中包括人工智能拼写检查。谷歌搜索主管认为,这一改进本身比前五年的进展都更为重要——在语言相关人工智能的短暂历史中,对语言服务行业发展的回顾是一个有趣的时刻。 自从神经机器翻译(NMT)首次以其革命性的本地化潜力掀起波澜以来,几年里发生了很大的变化,人们持续关注的问题也不是一成不变的。例如,三年前,神经机器翻译对全球语言多样性的影响是一个值得关注的问题。虽然人们仍普遍担忧技术对少数群体造成的意外后果,但现在,作为一个伦理问题,人工智能在语言群体中的包容性则是讨论度更高的话题,因为现在算法已经可以根据人口统计数据生成群体内和群体外的统计数据。这一点并未体现在翻译行业的危机中,这在一定程度上说明了这些能力的推出相对温和,也表明了即使在几年后,原始机器翻译和包含人类的模型之间仍存在的性能差距。随着以机器为中心翻译模式的局限性逐渐凸显,在全球化的世界经济中,具有绝对多样性的人群正在相互影响,即将迎来的不是NMT对全球多样性的挑战,而是多样性对NMT有效性的更大挑战。 在性能方面,我们对NMT工作原理的理解,以及如何改善原始机器翻译,是一个重要的改进方向。无论是在初始时期还是现在,NMT都需要人类语言学家的密集工作,不仅要消除可能出现的任何语言缺陷,还要验证模型本身的性能是否正确。普通翻译是用正确的变体和变节来表达事物,而机器翻译后期编辑(MTPE)也是验证这些做法背后的机制,这无疑是一项更艰巨的任务。为了减轻人类语言学家和工程师的负担,人们开始转向关注机器翻译训练的关键要素,即在相关主题领域准确翻译语言数据集的基础上,准备神经翻译模型。 随着NMT实践在整个行业中的持续发展,值得关注的是,现在进入“黄金时代”的自然语言处理(natural language processing,NLP)能力将在某个节点进入本地化的实际应用。当这种情况发生时,随着人类语言学家在质量保证中的角色不再那样重要,很可能发生一种范式的转变。然而,随着世界经济持续经受危机,我们可能首先看到机器翻译解决方案市场的显著活跃。由于NMT首先能提高大批量翻译的成本效益和效率,因此当预算成为决定性因素时,NMT通常是首选方法。NMT可能看起来不像以前那么尖端,但它是一种更加成熟的技术,在本地化战略中能起到既定的作用。 CSOFT International拥有一个由语言学家、行业专家和工程师组成的全球网络,他们在机器翻译和语言审查方面接受过最新最好的实践训练,能够制定经济有效的解决方案,以满足公司进入新市场时的所有翻译需求。关于我们的翻译技术和MTPE服务,您可以在csoftintl.com了解更多!

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文