What is machine translation?

什么是机器翻译?

2020-11-18 02:10 Smartcat

本文共1135个字,阅读需12分钟

阅读模式 切换至中文

Machine translation is an integral part of the translation and localization industry today as companies further try to scale, automate, and streamline translation output. But what is machine translation exactly and how does it work? How can we control translation quality and where are human translators involved?How does machine translation work?Machine translation, simply put, is the use of software to translate — either from text or speech — from one language to another. By utilizing algorithms, patterns, and language models taken from large databases of existing translations, it can either suggest a translation to language professionals or in some cases, automatically translate large quantities of texts without human involvement at all. For context, the software factors in the subject category (medical, legal, scientific, for example), online resources, and glossaries. There are different types of machine translation with varying levels of sophistication, some continuously learning and improving suggestions over time. That being said, human linguists are still heavily needed to control quality and the localization for specific target audiences. You may have also read about computer-aided translation, machine-aided human translation, and interactive translation. These are not the same as Machine Translation, each with its own unique characteristics and toolset.Machine translation typesHumans have been tinkering with machine translation technology since as early as the 1940s, with each new technology improving the processes incrementally over time. And, in the past five years, emerging technologies, like AI and deep learning, have also been greatly integrated into its inner workings. There are three types of machine translation; statistical machine translation (SMT), rule-based machine translation, and neural machine translation.Rule-based machine translation (RbMT)The first widely used machine translation software, which is still employed today, is a rule-based system – hence the name – that relies on a near infinite number of algorithms based on language grammar, syntax, and phraseology.Statistical machine translation (SMT)Statistical machine translation is a method that has been avidly developed over the past decade, though it was first conceptualized in 1949. SMT uses statistical language models with parameters that are based on language resources made up of large, structured sets of texts. Though it effectively uses human and data resources, it’s often known for its superficial fluency, like the typical non-fluent translations associated with Google Translate. It also doesn’t work well with language pairs whose syntax greatly differs. Linguists need to lend a heavy hand.Neural machine translation (NMT)The most relevant of all three is neural machine translation, which saw its debut in 2016. NMT uses artificial neural networks to predict the sequence of words and continuously improves translations by learning from resources, databases, glossaries, and the translation suggestions approved by translators. NMT software generally runs on the graphics units of CPUs to accommodate the huge processing power it needs to operate.Many translation service companies use NMT as they have realized just how much it increases translation productivity and cuts costs, which is a key B2B selling point. Organizations that use it are Microsoft (including Skype, Bing, etc.), Systran, Reverso, and IBM.Hybrid machine translationHybrid machine translations means that two of these mentioned types are used at the same time. Companies use this method as a fail-safe way of delivering accuracy and assuring control, instead of relying on one solution. Prompt, Systran, and Omniscien Technologies are some companies that use it.Which machine translation type is better?There are pros and cons for each kind of machine translation. RbMT is better in the way of consistency and predictable quality than SMT, while the latter presents much better fluency and is more apt at catching exceptions to rules. However, the most sought-after solution now is neural machine translation software.Machine translation systemsThere are three types of machine translation systems that can apply to any of the machine translation technologies:Generic MT is the most basic of the MT systems that provide instant translations with little to no customization, such as Google Translate, Bing, Reverso, and Yandex.Customizable MT uses the basis of Generic MT, but allows its users to tailor the terminology based on the context, category, style, target audience, etc.Adaptive MT is the system most often used in CAT tools. It offers live translation suggestions to language professionals and learns from the choices that are made over time in order to improve what’s suggested. Adaptive MT works alongside translation memories and has proved to be one of the most helpful tools for translators, as it greatly speeds up work and output.Machine translation technology, tools, and servicesMachine translation has a broad availability, such as in the cloud, on platforms, on servers, or via software integration with the use of an API. For example, translation services from Google, Microsoft, and Amazon sell cloud API, while other developers like Systran and Prompt offer customizable MT either via server or desktop products. Professional translators, however, mainly use MT right in the CAT tools they need for their work, like Trados, MemoQ, and the like.Users can also tap into independent and open-source machine translation options. They allow anyone with the technical know-how to build their own machine translation engine. To use any open source toolkit, you need to be equipped with a large collection of parallel texts in two languages.Machine translation qualityMachine translation software, though being massively helpful in improving translator productivity as well as translating large volumes of texts, must also adhere to high standards of quality. This is why human language professionals are tasked with MT post-editing to make sure the result is a natural translation that fits the context, has a human conversational feel, and is accurately localized for target audiences.Translation quality is also assured on a more technical side. Computational engineers are tasked with reviewing MT engines with A/B tests and experimentations on an ongoing basis. Some tests, like the BLEU auto-test (Bilingual Evaluation Understudy), ROUGE, NIST, and METEOR meticulously examine the similarity between machine and human translations of the same text. Another concern next to quality is security. Seeing as a lot of the machine translation platforms are shared, the translations are not always kept confidential. Many companies combat this by setting up an on-site machine translation engine that runs inside the corporate network with no external access. Cloud solutions, on the other hand, use data encryption. That's why companies should avoid options open to the public, which are easy gateways for hackers.What’s next?Machine translation technology is an exciting interdisciplinary field that combines the latest in technology, linguistics, and localization. The ever-growing need for content localization will continue to push for technological advancements in MT at an accelerated pace. Language professionals on their side need to find effective ways to control the quality and human touch of machine translations.
随着公司进一步尝试缩放自动化和简化翻译输出,机器翻译已成为当今翻译和本地化行业不可或缺的一部分。但什么是机器翻译呢?它如何工作?我们如何控制翻译质量以及在哪里增加人工翻译?机器翻译如何工作?简而言之,机器翻译就是使用软件将文字或语音从一种语言翻译成另一种语言。通过利用从现有翻译的大型数据库中获取的算法,模式和语言模型,它既可以向语言专业人员建议翻译,也可以在某些情况下自动翻译大量文本,而无需人工干预。就上下文而言,软件因素包括主题类别(例如,医学,法律,科学),在线资源和词汇表。机器翻译的类型不同,其复杂程度也不同,随着时间的推移,一些机器翻译会不断学习并改进建议。话虽如此,仍然非常需要人类语言学家来控制质量和特定目标受众的本地化。您可能还已经阅读了有关计算机辅助翻译,机器辅助人工翻译和交互式翻译的信息。这些与机器翻译不同,每种机器都有其独特的特性和工具集。机器翻译类型人类早在1940年代就一直在研究机器翻译技术,随着每一项新技术的不断改进,它们都在不断改进。而且,在过去五年中,诸如AI和深度学习之类的新兴技术也已被极大地集成到其内部工作中。机器翻译分为三种类型:统计机器翻译(SMT),基于规则的机器翻译和神经机器翻译。基于规则的机器翻译(RbMT)第一个被广泛使用的机器翻译软件(今天仍在使用)是基于规则的系统,因此得名–统计机器翻译(SMT)统计机器翻译是一种在过去十年中得到了极大发展的方法,尽管它最早于1949年被概念化。它依赖于基于语言语法,语法和短语的几乎无限种算法。 SMT使用统计语言模型,其参数基于由大型结构化文本集组成的语言资源。尽管它有效地利用了人力和数据资源,但由于其表面流利性而广为人知,例如与Google Translate相关的典型的非流利的翻译。它也不适用于语法差异很大的语言对。语言学家需要加倍努力。神经机器翻译(NMT)这三者中最相关的是神经机器翻译,它于2016年首次亮相。NMT使用人工神经网络来预测单词的顺序,并通过学习从资源,数据库,词汇表以及翻译员认可的翻译建议。 NMT软件通常在CPU的图形单元上运行,以适应其所需的巨大处理能力。许多翻译服务公司都使用NMT,因为他们已经意识到NMT可以提高翻译效率并降低成本,这是B2B卖点的关键。使用它的组织是Microsoft(包括Skype,Bing等),Systran,Reverso和IBM。混合机器翻译混合机器翻译意味着同时使用了上述两种类型。公司将这种方法用作提供准确性和确保控制的故障安全方法,而不是依靠一种解决方案。 Prompt,Systran和Omniscien Technologies是一些使用它的公司,哪种机器翻译类型更好?每种机器翻译各有利弊。与SMT相比,RbMT在一致性和可预测质量方面更好,而SMT则具有更好的流利性,并且更易于捕获规则例外。但是,目前最受欢迎的解决方案是神经机器翻译软件。机器翻译系统三种类型的机器翻译系统都可以应用于任何机器翻译技术:通用机器翻译是最基本的机器翻译系统,提供即时翻译而几乎没有定制,例如Google Translate,Bing,Reverso, Customizable MT使用Generic MT的基础,但允许其用户根据上下文,类别,样式,目标受众等来定制术语。自适应MT是CAT工具中最常用的系统。它向语言专业人士提供实时翻译建议,并从长期的选择中学习,以改进建议。自适应MT与翻译记忆库一起使用,并且已被证明是翻译人员最有用的工具之一,因为它可以极大地加快工作和输出速度。机器翻译技术,工具和服务机器翻译具有广泛的可用性,例如在云端,平台,服务器上或通过使用API​​的软件集成。例如,来自Google,Microsoft和Amazon的翻译服务销售云API,而其他开发人员(例如Systran和Prompt)则通过服务器或台式机产品提供可自定义的MT。但是,专业翻译人员主要在其工作所需的CAT工具中使用MT,例如Trados,MemoQ等。用户还可以使用独立的开源机器翻译选项。他们允许具有技术知识的任何人构建自己的机器翻译引擎。要使用任何开源工具包,您都需要配备两种语言的大量平行文本。机器翻译质量机器翻译软件虽然在提高翻译效率和翻译大量文本方面有巨大帮助,但还必须遵守高标准的质量。这就是为什么人类语言专业人员要承担MT后编辑的任务,以确保结果是适合上下文的自然翻译,具有人类对话的感觉,并且可以准确地针对目标受众进行本地化。计算工程师的任务是不断对A / B测试和实验的MT引擎进行审查。一些测试(例如BLEU自动测试(双语评估研究),ROUGE,NIST和METEOR)都仔细检查了同一文本的机器翻译和人工翻译之间的相似性。质量旁边的另一个问题是安全性。鉴于许多机器翻译平台是共享的,因此并不总是对翻译保密。许多公司通过设置一个现场的机器翻译引擎来解决这个问题,该引擎在公司网络内部运行,而无需外部访问。另一方面,云解决方案使用数据加密。这就是为什么公司应该避免向公众开放的选择,它们是黑客的便捷门户。下一步是什么?机器翻译技术是一个令人兴奋的跨学科领域,将最新技术,语言学和本地化相结合。对内容本地化的日益增长的需求将继续推动MT技术进步。语言方面的专业人员需要找到有效的方法来控制机器翻译的质量和人性化。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文