The Future of Translation Quality Assessment: An Interview with Dr. Joss Moorkens

翻译质量评估的未来:Joss Moorkens博士访谈录

2020-06-15 22:10 RWS Moravia Insights

本文共1403个字,阅读需15分钟

阅读模式 切换至中文

In this world of growing online consumer engagement with products, there are greater volumes of content to be translated than ever before. Consumers are demanding more and more product descriptions, reviews, online help and even social media…and corporations would be wise to provide it all in multiple languages. To handle this much content and at the required speed, machine translation (MT) often comes into play. But is the quality good enough? Raw MT might be acceptable, or post-editing might be needed in order to meet translation quality requirements. Considering the possible variations in translation quality resulting from the various processes, quality is awfully hard to define and assess. Maribel Rodríguez, Language Technology Deployment Manager at RWS Moravia, talked to Dr. Joss Moorkens to find out how his research in this field is helping address the pressing questions in this space. Joss is an Assistant Professor at the School of Applied Language and Intercultural Studies at Dublin City University and a researcher at the ADAPT Centre and the Centre for Translation and Textual Studies. He has authored over 40 journal articles and book chapters on translation technology, post-editing of machine translation, user evaluation of machine translation and translation technology standards. The nature of quality and quality evaluation in this era of MT is one of the topics we discussed with him. MARIBEL: Could you explain the focus of your research? JOSS: My focuses are translation technology and how humans work with machine translation. We’ve worked on creating different types of interfaces: an interface for mobile, one that uses touch and voice and ones that are focused on accessibility. Other areas of focus are post-editing processes, translation process research and translation quality assessment. MARIBEL: What are the big trends you’re seeing in translation quality assessment? JOSS: Over the last few years, there’s been a change in how translation quality is assessed in that it’s now calibrated per customer and per project. There seems to be an increasing trend towards “good enough” quality for certain purposes and for more perishable content. Certain metrics have become more widely available such as Multidimensional Quality Metrics (MQM) and Dynamic Quality Framework (DQF) that allow a certain amount of tailoring and can be used for a wide variety of purposes, whereas in the past, there had been research in many different annotation metrics for machine translation. Now, a subset of these larger metrics (MQM and DQF) can be chosen for machine or human translation. JOSS: And how about you, Maribel? What is your experience regarding these changes in translation quality? MARIBEL: I’ve been working in localization for nearly 17 years now, and when I started, there was this sort of “one size fits all” approach to linguistic quality using LISA standards. It didn’t matter if there was a very demanding customer or if someone just needed something that was perishable for gisting purposes, for example. The metrics that we used and the approach to human review were always the same, whereas now, everything is customized to the customer and in many cases, also at the project level. We have many different quality approaches depending on the nature of the content and different sets of parameters. JOSS: The other thing that’s changed is that the combination of improved machine translation quality with financial imperatives has pushed machine translation into more use cases than we’ve seen previously. For example, raw machine translation is being tested for user interfaces. In some cases, a light review might be permissible rather than full post-editing. I am not sure this is a good thing, but this seems to be a trend at present. MARIBEL: Is there an agreed-upon industry or academic definition as to what translation quality actually means? JOSS: No! Translation quality is usually agreed upon at the project level. We try to look for ways to measure quality that might be parallel with human judgement, but the answer to so many things with translation is" it depends. The expected quality for a literary novel is not going to be the same as for a TripAdvisor review. There are lots of variables that will change the expectation of quality and the translation process. Many things will be tailored to the value placed on the content and the amount of money available, so I don’t think it’s possible to have a single definition of what quality is. MARIBEL: What are the key challenges of assessing translation quality in the current environment? Where are the lines between human translation and machine translation getting more blurred? JOSS: The main challenge is trying to have predictive quality measurements, or confidence estimations, for machine translation. It’s a real problem when machine translation is priced based on a previous job, and then something is back-translated and the quality is massively different. Often, it’s not possible to price the amount of post-editing effort accurately. So, best practice would be to price retroactively for time, but a lot of language service providers are not comfortable doing that. MARIBEL: I would be interested in hearing about your book. Who is it for and what is it about? JOSS: Four of us co-edited a book: myself, Sheila Castilho, Federico Gaspari, a post-doc here at the ADAPT Centre and a lecturer in Reggia in the south of Italy, and Stephen Doherty, a Dublin City University and ADAPT Centre alumnus now at the University of New South Wales. It’s called Translation Quality Assessment - from Principles to Practice. It’s for practitioners within the industry as well as researchers. We review current approaches to human and machine translation quality assessment. There’s a chapter about the quality management principles and practices within the European Union institutions by Joanna Drugan from the University of East Anglia and by lawyer-linguists from the European Commission. It’s a detailed description of the gold standard of translation quality assessment within probably the largest translation agency in the world. There are sections about education in training, crowd sourcing and translation quality and applications of translation quality assessment, including the MQM metrics for standardized error typologies. Andy Way wrote a chapter about quality expectations and machine translation and the increasingly different uses for MT. We have a chapter about MT post-editing itself for academic writing support: academics are trying to write articles, yet they’re linguistically disadvantaged by a lot of scientific material requiring English-language publication. It’s testing MT plus self-post-editing for those sorts of academic articles. Finally, there is a chapter about the level of quality that Neural Machine Translation (NMT) can attain on literary texts by Antonio Toral from the University of Groningen. MARIBEL: If you had to put on your futuristic hat, where do you think translation quality assessment is heading? JOSS: The importance of confidence estimations will be key. Machine Translation is going to be involved in more and more translation workflows as time goes on, so it’s important to think about how we introduce it. It could be that post-editing is not the best method. A couple of translators in Ireland who I spoke to said that they prefer to use MT as a starting point, to give them ideas for how you might translate a segment. They said that it increased their speed, but how you would price that as an employer is difficult. The interactive MT method used by some localization tools doesn’t have the increase in throughput when moving from Statistical-based MT to Neural MT that we expected from looking at the number of keystrokes that are required and from looking at the increase in other quality measures, particularly fluency. So, figuring out the best way to introduce MT into workflows, measure the quality and make sure that the errors in NMT aren’t in the final output will become more focused over the next five years or so. In addition, we’ll be trying to encourage a sustainable balance between long-term benefits for all translation stakeholders and short-term aims to eliminate waste and excess cost within the production process. MARIBEL: Thanks Joss! JOSS: My pleasure. If you are interested in understanding how machine translation outputs can be measured and how human processes interplay, don’t hesitate to reach out to us. Our dedicated MT team loves to talk about this stuff.
当今世界,网络消费者对产品的参与度不断增长,要翻译的内容比以往任何时候都要多。 消费者对产品描述,评论,在线帮助,甚至社交媒体等的需求愈来愈多。企业最好多语言提供所有信息。 为了处理如此多的内容并按时完成翻译任务,机器翻译(MT)通常会发挥着重要作用。 但是机翻质量能达到标准吗? 原始机器翻译质量可能合格,或者可能需要译后编辑过程以满足翻译质量要求。考虑到各种过程可能导致翻译质量发生变化,因此很难定义和评估翻译质量。 语言服务提供商RWS Moravia的语言技术部署经理Maribel Rodríguez与Joss Moorkens博士进行了交谈,以了解Joss Moorkens博士在语言领域的研究如何帮助解决该领域的紧迫问题。 Joss是都柏林城市大学应用语言和跨文化研究学院的助理教授,也是ADAPT中心和翻译与文本研究中心的研究员。 他撰写了期刊文章和书籍达40余篇,内容涉及翻译技术,机器翻译的译后编辑,机器翻译的用户评估和翻译技术标准。 机器翻译盛行的时代,我们与 Joss博士讨论的主题之一是质量和质量评价的本质。 Maribel:您能阐述一下您的研究重点吗? Joss: 我的研究重点是翻译技术以及人类该如何利用机器翻译。 我们竭尽全力创建不同类型的界面: 用于移动设备的界面,使用触摸和语音的界面,以及专注于可访问性的界面。 我的研究重点还包括译后编辑过程,翻译过程研究和翻译质量评估。 Maribel: 您在翻译质量评估中看到的主要趋势是什么? Joss: 过去几年,评估翻译质量的方式发生了变化,现在是按客户和项目进行校准。 具有明确目的、及时的内容,似乎对翻译质量要求愈来愈高。 某些衡量标准广泛可用,例如多维质量度量标准(MQM)和动态质量框架(DQF)标准,这些衡量标准可以进行一定程度的调整且用途广泛,而在过去,有许多针对机器翻译的不同标准。 现在,可以选择这些较大衡量标准(MQM和DQF)的子分类标准评估机器翻译或人工翻译 。 Joss:你呢,Maribel? 对于这些翻译质量的变化,你有什么体会? Maribel:我有近17年的本地化工作经验,刚接触本地化工作时,存在“一刀切”现象——使用LISA标准评估语言质量。 例如,无论存在非常苛刻的客户,还是有人为了满足需求只需一些及时的东西,这都无关紧要。 我们使用的衡量标准和人工审校的方法始终保持不变,而现在,一切都是为客户定制的,在很多情况下,是按照项目级别定制。 根据内容的性质和不同的规范,质量评估标准不尽相同。 Joss:另一个变化是,随着机器翻译质量的提高和财务需求的结合,机器翻译的运用愈加广泛,前所未有。 例如,原始机器翻译正在针对用户界面进行测试。 在某些情况下,可能允许进行轻度审校,而不是进行全面译后编辑。 我不确定这是否是一件好事,但目前看来这是一种趋势。 Maribel:对于翻译质量真正的涵义,业界或学术界是否有一致的定义? 乔斯:没有! 通常在项目级别就翻译质量达成共识。 我们尝试寻找一种可以与人类判断类似的质量衡量方法,但对于翻译中的许多问题,答案是“视情况而定”。文学小说的预期质量将与TripAdvisor评论的预期质量不同 。有很多变量会改变对质量的期望和翻译过程。许多事情将根据内容的价值和可用金额的多少进行定制,因此我认为对于翻译质量的涵义不可能有一个统一的定义。 Maribel:在当前环境下评估翻译质量的主要挑战是什么? 人工翻译和机器翻译之间的模糊界限究竟在哪里? Joss:主要的挑战就是尝试进行机器翻译的预测质量评估或置信度评估。 机器翻译是根据先前的工作定价的,然后进行回译,并且质量有很大差异,这是一个真正的问题。 通常,无法准确估算出译后编辑的费用。 因此,最佳做法是按时间追溯价格,但是许多语言服务提供商对此并不满意。 Maribel:我想听您说说您的书,您的书是为谁写的?关于什么呢? Joss:我与其他三人合撰了一本书,包括Sheila Castilho,Federico Gaspari(ADAPT中心的博士后,在意大利南部的雷焦担任讲师),以及Stephen Doherty(都柏林城市大学和ADAPT中心的校友,现在新南威尔士大学任职)。 这就是所谓的翻译质量评估——从原则到实践。翻译质量评估适用于行业内的从业人员和研究人员。 我们可以回顾目前人工翻译和机器翻译质量评估的方法。 有一章是关于欧盟机构内部质量管理原则和实践,作者是来自东安格利亚大学的Joanna Drugan和来自欧盟委员会的律师语言学家,详细描述了世界上最大翻译机构的翻译质量评估金标准。 有些章节内容涉及培训教育、众包、翻译质量以及翻译质量评估的应用等等,包括标准化错误类型的MQM模型。 Andy Way撰写了一个章节,内容关于质量期望、机器翻译以及机器翻译的不同用途。 有一章关于机器翻译的译后编辑,以提供学术写作支持:学者们尝试写文章,但由于大量的科学材料需要用英语出版,所以他们在语言上处于劣势。“机器翻译+译后编辑”模式用于大量的学术文章。 最后,格罗宁根大学的安东尼奥·托拉尔(Antonio Toral)撰写的一章节,关于神经机器翻译(NMT)在文学文本上所能达到的质量水平。 Maribel:展望未来,您认为翻译质量评估将走向何方? Joss:置信度评估的重要性将是关键。 随着时间的流逝,机器翻译将涉及越来越多的翻译工作流程,因此考虑如何引入置信估计非常重要。 译后编辑可能并非最好的方法。 我曾与爱尔兰的一些翻译交流,他们喜欢以机器翻译为起点,让他们知道如何翻译一个片段。 他们说,这提高了翻译速度,但是作为雇主,困难的是您该如何定价。 从基于统计的机器翻译转换为神经机器翻译时,某些本地化工具使用的交互式机器翻译方法并没有增加工作量,这是通过查看所需的击键次数以及查看其他质量指标(特别是顺畅性)的增长得出的。 因此,未来五年,着重点就是找到将机器翻译融入工作流程、评估翻译质量、避免神经机器翻译引擎错误结果输出的最佳方法。 此外,我们将努力寻求长期利益(与所有翻译利益相关者)与短期目标(消除生产过程中浪费和多余成本)之间的可持续性平衡。 Maribel:谢谢您,Joss! Joss:不客气,这是我的荣幸。 如果您想了解如何测量机器翻译输出以及人机交互如何相互作用,请随时与我们联系。 我们研究机器翻译的团队非常敬业,乐于谈论这些内容。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文