Post-editing of Machine Translation: Negotiating Projects

机器翻译的后期编辑:协商项目

2020-03-19 16:00 Wordbee Translator

本文共1029个字,阅读需11分钟

阅读模式 切换至中文

This article is part of a series on machine translation and post-editing addressing translators, language service providers (LSPs) and enterprises alike. Previous articles: Three Approaches to Post-editing of Machine Translation A Machine Translation Checklist for New Users 5 Effective Strategies for Post-editing of Machine Translation   Machine translation (MT) is becoming ubiquitous, so much so that it is to be considered a key element of translation automation. Enterprise clients with considerable financial resources for the development of a proprietary MT system have started to introduce post-editing in their workflow, while organizations that are still new to this technology could turn to language service providers to request MT post-editing services or to consultants to seek guidance in the implementation of machine translation and/or post-editing. Within a project of post-editing of machine translation (PEMT), the requisites and expectations will vary depending on the different stakeholders. The enterprise customer that turns to the freelance translator/post-editor will do so for reasons that are different or in contrast with those for which he could turn to an LSP. Likewise, an LSP which turns to freelancers for post-editing does so with different requisites and with a different scope than the one with which the freelancer will see his/her assignment. An enterprise client might already have implemented its own MT engine, with its own data etc., and therefore already have a clear idea of what to expect for post-editing. Another translation buyer might simply be anxious to implement MT (maybe also because of the NMT hype) to speed up its processes, increase its volumes and - let’s say it - to save money. In the first case, negotiations of post-editing project could be centered around setting up the right criteria for a Service Level Agreement (or at least follow the same path); in the second case, the translation buyer might need advice on the whole MT+PEMT project.   PEMT Projects: Three Essential Elements There are three main aspects to consider when negotiating a PEMT project. The quality of the data on which the engine(s) have been trained. Since many errors in MT outputs can be found in terminology, sentence structure, and punctuation in the source text, data quality is paramount.  Statistical or neural engine. Although nowadays MT is mostly NMT, just like with RbMT before the advent of SMT, an enterprise might have already invested in the past in a well-trained and tested statistical engine, with a more than acceptable raw output quality.  General or vertical engine. An enterprise might find that the output of a general MT engine (like Google Translate or DeepL) is “good enough” for its goals for many reasons, while another would go for a vertical engine to run in-house to avoid IP/confidentiality issues. In terms of post-editing and final results, the more a machine translation engine is vertical and well-trained, the better. In the case of a machine translation system that is customized for language pair, domain and text typology, the output will be of reasonably high quality. Client’s expectations. Next, you’ll need to consider your client’s expectations, especially in terms of time-to-market and final quality. An MT-savvy client might already have a general idea of the quality of the raw output and plan according to a defined productivity model, while a client who’s new to MT might have unreasonable expectations due to their lack of experience. SaaS platform or API integration. Is the MT technology available in SaaS mode (and therefore the source text will be pre-translated and then sent to post-editing for further processing) or through an API connector  to the engine within a CAT tool (and, therefore, post-editors will work efficiently with all functionalities of the tool at their disposal)?  Target group and use of the post-edited text. The level of post-editing will have to be defined not only based on the quality of the raw output, but also on the target group and the use that the client intends to make of the post-edited text (for example, print in a brochure or on a website, and so on). Guidelines.  As said before, when dealing with a customized engine, the MT output will be of high or good quality. In this case, the PEMT guidelines should be very specific and rigorously based on the types of errors produced by the engine. It will be necessary to indicate the level of PEMT necessary (light or full), and what the purpose of the text and the target group are.  Terminology. If using an  online general MT engine, it’s advisable to treat the machine translation output more like suggestions to be checked not only for accuracy, but also for terminology. In addition, after the post-editing task, wherever possible, subject-matter experts should check the translation output for terminology accuracy.   Client’s economic objectives. An MT-savvy client knows that economic gains are not immediate. On the other hand, for a novice customer, cost-savings will play a major, if not primary, role.    Pricing When it comes to pricing, it’s important to create a compensation model either prior to or after post-editing. Before starting. If you want to set a pricing model before starting the post-editing project, you’ll need a clear-cut predictive scheme (for example the translation-memory fuzzy match scheme) and apply a word rate. The disadvantage is that this model may prove largely inadequate.  Translation-memory fuzzy matches and MT segments differ significantly. Fuzzy matches over 85% are inherently correct and requires minor changes; on the other hand, machine-translated segments may contain errors and inaccuracies. In many cases, even a light post-editing may prove challenging. This model can be suitable for light post-editing of a very good output when time-to-market is the first requirement. After completion of the project. For this compensation model you need to perform an accurate measurement of the actual work performed, i.e. calculate the edit distance and then infer the percentage on an hourly rate. This pricing model is suitable for full post-editing.
本文是关于机器翻译和后编辑寻址翻译、语言服务提供商( LSP )和企业的系列文章的一部分。 以前的文章: 机器翻译后编辑的三种方法 新用户的机器翻译检查表 5《机器翻译》后编辑的有效策略 机器翻译( MT )正变得无处不在,以至于它被认为是翻译自动化的一个关键因素。 为开发专有 MT 系统而拥有大量财政资源的企业客户已经开始在其工作流程中引入后期编辑,虽然对这项技术仍然是新技术的组织可以求助于语言服务提供商,要求 MT 提供编辑后服务,或向咨询人寻求在实施机器翻译和/或编辑后的指导。 在机器翻译( PEMT )的后期编辑项目中,需求和期望将根据不同的利益相关者而有所不同。转向自由译者/后编辑的企业客户将这样做,原因是不同的,或者与他可以转向 LSP 的原因相反。同样,一个向自由职业者提供后编辑服务的 LSP 也有不同的要求,并且范围不同于自由职业者将看到他/她的任务的范围。 企业客户端可能已经用自己的数据等实现了自己的 MT 引擎,因此已经明确了后编辑的预期。另一位翻译买家可能只是急于实现 MT (可能也是因为 NMT 的炒作),以加快其流程,增加其数量,并(比如说)节省资金。在第一种情况下,编辑后项目的谈判可以集中于为服务水平协议制定正确的标准(或至少遵循相同的路径);在第二种情况下,翻译买方可能需要对整个 MT + PEMT 项目提供建议。 PEMT 项目:三个基本要素 在协商 PEMT 项目时,需要考虑三个主要方面。 对发动机进行了训练的数据的质量。由于 MT 输出中的许多错误可以在源文本中的术语、句子结构和标点中找到,因此数据质量至关重要。 统计或神经引擎。尽管现在的 MT 主要是 NMT ,就像 SMT 出现之前的 RbMT 一样,一个企业过去可能已经投资了一个经过良好培训和测试的统计引擎,具有比可接受的原始输出质量更高的质量。 通用或垂直发动机。企业可能会发现,通用 MT 引擎(如谷歌翻译或 DeepL )的输出对于其目标来说“足够好”,原因很多,而另一个企业可能会选择垂直引擎内部运行,以避免 IP /保密问题。就后期编辑和最终结果而言,机器翻译引擎越垂直,受过良好训练,就越好。对于为语言对、域和文本类型定制的机器翻译系统,输出质量将相当高。 客户的期望。接下来,你需要考虑你的客户的期望,尤其是在时间到市场和最终质量方面。精通 MT 的客户可能已经对原始产出的质量有了总体的认识,并根据已定义的生产率模型制定了计划,而新加入 MT 的客户可能由于缺乏经验而有不合理的预期。 SaaS 平台或 API 集成。MT 技术是否可以在 SaaS 模式下使用(因此,源文本将被预先翻译,然后发送到后编辑进行进一步的处理),或者通过一个 API 连接器发送到 CAT 工具中的引擎(因此,后编辑器将有效地利用工具的所有功能)? 目标组和后编辑文本的使用。编辑后的水平不仅必须基于原始输出的质量,还必须基于目标群体和客户打算使用编辑后的文本(例如,在小册子或网站上打印等)来定义。 指导原则。如前所述,在处理定制引擎时, MT 的输出将是高质量的。在这种情况下, PEMT 指南应该是非常具体和严格地基于引擎产生的错误类型。有必要指出 PEMT 的水平(光或满)以及文本和目标组的目的是什么。 术语。如果使用在线通用 MT 引擎,建议将机器翻译输出视为建议,不仅要检查准确性,而且要检查术语。此外,在编辑后任务完成后,只要有可能,专题专家应检查翻译结果以确定术语的准确性。 客户的经济目标。一位精通 MT 的客户知道,经济增长并非立竿见影。另一方面,对于新手客户而言,成本节约将发挥重要作用,如果不是主要作用的话。 定价 在定价方面,重要的是在编辑之前或编辑后创建一个补偿模型。 开始之前。如果你想在开始后编辑项目之前设置一个定价模型,你需要一个清晰的预测方案(例如翻译记忆模糊匹配方案)并应用一个词率。缺点是这种模式可能在很大程度上是不够的。翻译记忆模糊匹配和 MT 段差异显著。85%以上的模糊匹配本质上是正确的,需要稍作修改;另一方面,机器翻译的段可能包含错误和不准确。在许多情况下,即使是轻松的编辑也可能是很有挑战性的。该模型可以适用于轻量后编辑的一个非常好的输出时,市场的时间是第一个要求。 项目完成后。对于此补偿模型,您需要对实际执行的工作执行准确的测量,即计算编辑距离,然后根据每小时费率推断百分比。此定价模型适用于完全编辑后。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文