5 Effective Strategies for Post-Editing MT


2020-03-30 16:34 wordbee


阅读模式 切换至中文

Post-editing of Machine Translation (PEMT) wasn’t born yesterday (MT and post-editing is mature in Wordbee, for example). On the contrary, it’s as old as machine translation (MT) itself. And although at the moment, we have a large amount of material at our disposal about this topic, the nuances of the discussion are such that in some cases we risk losing sight of what PEMT really is. PEMT: No Need for Creativity Let’s start by saying that PEMT has nothing to do with revision. Nor does it require the “creativity” of, let’s say, transcreation. In spite of all the articles written on the subject and a brand-new ISO standard, up to now the most accurate definition of post-editing comes from the 2010 TAUS Post-editing in Practice report: “Post-editing is the process of improving a machine-generated translation with “a minimum of manual labour.” The keywords in this definition are “a minimum of manual labour.” While revision is based on a contrastive analysis of source and target texts and requires the reviser to check and edit terminology, style, and grammar, PEMT is characterized by higher productivity and limited cognitive bilingual effort. The main changes will concern mechanical errors (capitalization and punctuation), grammar errors, terminology inconsistencies (e.g. missing words), and other issues that are often the product of a poor source text and result in poor readability of the target text. A post-editor is not expected to rewrite entire sentences (unless those sentences are obvious nonsense or contain word salads), so they should only amend what’s necessary to make a sentence clearer to the reader. The skills that distinguish a reviser from a post-editor are also different: A reviser must have a sound knowledge of both source and target languages, of translation techniques, and of a specific domain, but a post-editor, on the other hand, may even be monolingual. No matter what, though, they must have a strong knowledge of the target language and of the specific domain, and, ideally, an idea of how machine translation works. 3 PEMT Approaches in Practice PEMT and the Enterprise Let’s take an enterprise that has developed its own engine. The PEMT task could take place in a CAT environment or, in the case of enterprises that have their own MT engines but no translation department as such, it could be entrusted to external language service providers (LSPs). Because in this instance we’re dealing with a customized engine, the MT output will be of high or good quality. The PEMT guidelines will be very specific and rigorously based on the error typologies produced by the engine in question. It will be necessary to indicate the level of PEMT necessary (light or full post-editing), and what the purpose of the text and the target group are. Glossaries are essential if the MT engine has just been put into use and has shown some terminological teething problems. PEMT and the LSP Only a few LSPs have the financial and technical resources needed to develop client-specific MT engines. Most LSPs will resort to using vertical (domain-specific) engines developed by MT technology providers and available in SaaS mode according to a pay-per-use model. The MT output will be sent to internal or external post-editors. Alternatively, post-editors might receive an API key to use a vertical MT engine in a CAT tool environment. In this specific case, post-editing becomes an interactive task. Some LSPs will pre-translate a source text with a general MT engine, for example Google Translate or DeepL. This is a viable financial choice when starting out with MT, translating small texts or, again, facing a lack of financial/technical resources. In this approach, because the post-editing level and goals will change from project to project or from client to client, LSPs always need to provide information about the final use of the translation and accurate guidelines on how to conduct the task. PEMT projects could be split among many post-editors: The specificity and strictness of the guidelines will ensure a certain level of consistency. It’s also important to provide a client-specific glossary to reach a consistent use of terminology, especially in the case of a public engine. PEMT and the Freelancer Gone are the days of freelancers’ rage against machine translation. Nowadays, most freelancers use MT as a helpful tool that provides translation suggestions. The choice is usually Google Translate or DeepL (web version or with an API key). There are not precise PEMT guidelines in this instance. The freelancer using one or more general MT systems is free to decide which MT tool to use, how to use it, and how much to use of the MT output. From an ethical point of view, they should inform the client about the use of a public MT engine, or in any case, ask the client if there are specific criteria that might prevent the use of public MT engines. Think medical records, legal documents (involving sensitive or personal data; in one word GDPR), and confidential or IP-protected documents. One thing to remember: When using a public MT engine through an API key in a CAT tool environment, the segments containing MT output in some cases might be tagged with AT or MT, therefore revealing their origins. PEMT in the Translation Workflow It is also worth noting that an automatic translation is not something created by a machine with free will and an independent (albeit electronic) brain. An MT output (the technical term for automatic translation) is generated by stochastic calculation, like with statistical and neural machine translation, by an algorithm; the algorithm exploits bilingual translation corpora and, more generally, language data produced by humans. There can be various reasons for the high or low quality of an MT output: lack of clean language data, insufficient technical and financial resources for the development of an MT system, inadequate quality of the source text etc… But the common factor in all these reasons is due to humans. PEMT should replace the two main phases of the TEP (translation, editing, proofreading) workflow. In order to implement an MT engine, it’s necessary to adapt the translation workflow to this technology and make sure that your TMS has what it takes to manage the PEMT phase efficiently. • If you have developed your own MT engine, your TMS provider should give you the ability to connect it through an API key. • Post-editors should have linguistic resources at hand: spell-check dictionaries, termbases, and the ability to upload the PEMT guidelines as reference material. Project managers should have at their disposal functionalities like automated QA and 100% match blocking, addinglabels to specific segments, and so on. Training post-editors The PEMT courses available nowadays provide a general training on how machine translation works and give a few examples of the differences between revision and post-editing. There’s no need to provide long trainings using the MT output of Google Translate and DeepL, as the differences in many cases are minimal. Whether you’re an LSP or an enterprise, to help your post-editors to become more efficient it’s important to provide a second level of training based on customized engines, with specific domain, language pairs, and text typology. Don’t forget the basics of automatic post-editing: Instruct your post-editors on which functionalities and controls can be done on the MT output, for example how to visualize suggestions and how to modify them within your work environment.
机器翻译译后编辑(PEMT)并非昨天才诞生的(例如Wordbee中的机器翻译和译后编辑技术已十分成熟)。相反,译后编辑几乎和机器翻译(MT)同时诞生。尽管目前有大量关于此主题的材料可供使用,但是讨论的细微差别使得在某些情况下我们冒着忽视PEMT真正含义的风险。 PEMT:无需创新 首先,PEMT与修订无关,也不需要所谓译创的“创造力”。尽管有关于该主题的所有文章和全新的ISO标准,但迄今为止,最准确的译后编辑定义来自2010 TAUS实践中的译后编辑报告:“译后编辑是用‘最少的人工’改进机器生成的翻译的过程。” 此定义中的关键是“最少的人工”。 修订是基于对源文本和目标文本的对比分析,需要修订者检查和编辑术语、风格和语法,而PEMT的特点是生产率更高且认知双语要求不高。主要更改将涉及机械性错误(大写和标点符号)、语法错误,术语不一致(例如,单词遗漏)以及其他问题,这些问题通常是源文本质量较差并导致目标文本可读性较差的结果。译后编辑者不希望重写整个句子(除非这些句子很明显是胡说八道或存在言语杂乱的问题),因此他们仅应修改必要内容以使句子更清晰易懂。 修订者与译后编辑者的技能也有所不同:修订者必须对源语言和目标语言、翻译技巧和特定领域都有全面的了解,而译后编辑者甚至可能仅懂一种语言。但是,不管怎样,他们必须对目标语言和特定领域有深入的了解,并且最好对机器翻译的工作原理有所了解。 实践中的3种PEMT方法 PEMT与企业 让我们以一家已经开发自己的引擎的企业为例。PEMT任务可以在CAT环境中进行,或者对于拥有自己的MT引擎但没有翻译部门的企业,则可以将其委托给外部语言服务提供商(LSP)。 因为我们使用了定制引擎,因此MT输出的质量会比较高。PEMT指南将非常具体,并会严格根据相关发动机产生的错误类型进行精确处理。有必要指出必要的PEMT水平(轻度或充分的译后编辑),以及文本目的和目标人群。MT引擎刚刚投入使用就显现出术语上存在问题,术语表至关重要。 PEMT和LSP 只有少数LSP具有开发为客户定制MT引擎所需的财力和技术资源。大多数LSP选择使用由MT技术提供商开发的垂直领域(特定领域的)的引擎,并根据SaaS模式(租赁模式)按次付费。MT输出将发送到内部或外部的译后编辑器。或者,译后编辑者会收到API密钥,以在CAT工具环境中使用垂直级MT引擎。在这种特定情况下,译后编辑成为一项交互式任务。 一些LSP会使用通用MT引擎(例如Google Translate或DeepL)预先翻译源文本。如果您刚开始使用MT、翻译短篇文本或再次面临财力/技术资源短缺时,这是一个可行的经济型选择。 在这种方法中,由于译后编辑的级别和目标会因项目而异,或者因客户而异,因此LSP总是需要提供有关翻译最终用途的信息以及有关如何执行任务的准确指南。PEMT项目可以在许多译后编辑中进行划分:指南的具体程度和严谨程度将在一定程度上确保翻译产出的一致性。提供客户专用的术语词汇表、保证术语一致性也很重要,尤其是在使用通用引擎的情况下。 PEMT和自由译者 自由译者反对机器翻译的日子一去不复返了。如今,大多数自由译者都把机器翻译作为提供翻译建议的有用工具。他们通常选择谷歌翻译或DeepL (网页班版本或使用API密钥)。 在这种情况下,自由译者通常没有精确的PEMT指南。自由译者可选择使用一个或多个通用MT引擎,自由决定使用哪种MT工具,如何使用它,以及采用多少MT输出。但从职业道德的角度来看,自由译者应该告知客户其使用了通用MT引擎,或者询问客户是否有不能使用通用MT引擎的特别要求。要注意医疗记录、法律文件(涉及敏感信息或个人资料,总而言之就是要遵循通用数据保护条例)以及机密文件或知识产权保护文件。 要记住一件事:当在CAT工具环境中通过API密钥使用通用MT引擎时,在某些情况下,包含MT输出的片段可能被标记为AT(自动翻译)或MT(机器翻译),从而显示它们的来源。 翻译流程中的PEMT 同样值得注意的是,自动翻译并不是由一台拥有自由意志和独立(尽管是电子的)大脑的机器创造出来的。MT输出(自动翻译的技术术语)是由随机计算生成的,就像统计和神经机器翻译一样,由算法生成;该算法利用双语翻译语料库,更广泛地说,利用人类的语言数据。可能有不同原因影响机器翻译输出质量的高低的:缺乏清晰的语言数据,MT引擎发展所需的技术资源和财力短缺,源文本的质量不高等等……但所有这些原因的共同点是都包含人的因素。 PEMT应该取代TEP(翻译、编辑、校对)模式的两个主要阶段。为了实现MT引擎应用,有必要使翻译工作流程根据该项技术作出改进,并确保您的TMS(翻译管理系统)具备有效管理PEMT阶段所需的条件。 如果您开发了自己的MT引擎,则TMS提供程序应使您能够通过API密钥进行连接。 译后编辑人员应掌握语言资源:拼写检查字典、术语库,并应具有上传PEMT指南作为参考资料的能力。 项目经理应会使用自动质量检查和100%匹配检索、为特定句段添加标签等功能。 译后编辑培训 如今可用的PEMT课程提供了有关机器翻译工作原理的一般培训,并提供了一些有关修订和后期编辑之间差异的示例。 由于在许多情况下差异很小,因此无需使用Google Translate和DeepL的MT输出进行长时间的培训。 无论您是LSP还是一家企业,要帮助译后编辑人员提高工作效率,请务必根据定制MT引擎(特定领域、语言对和文本类型)对译后编辑人员二级培训。 请勿忘记自动译后编辑的基础:告诉您的译后编辑人员可以对MT输出进行哪些功能和操作,例如,如何在工作环境中将建议可视化以及如何修改它们。 译后编辑:罗温馨(中山大学)