Integrating MT into L10n Workflows: 5 Key Takeaways

将MT整合到L10n工作流程中:5个关键的收获

2021-06-02 21:50 Memsource

本文共1418个字,阅读需15分钟

阅读模式 切换至中文

Integrating MT into L10n Workflows: 5 Key Takeaways Machine translation adoption is on the rise, but questions still swirl around its benefits. Our latest Machine Translation Workshop brought together key MT experts to deliver answers—and much more. The adoption rate of machine translation is increasing, but questions about the benefits of MT still remain. How do you ensure efficiency gains without sacrificing quality? How can you work MT into your localization workflow? How do you get all stakeholders on board? We put together a group of MT experts to tackle these MT integration questions and more at our latest Machine Translation Workshop. Our star-studded panel included: Adam LaMontagne—Machine Translation Manager at RWS Paula Manzur—Machine Translation Specialist at Vistatec Jordi Macias—VP, Operations at Lionbridge Elaine O’Curran—Senior MT Program Manager on the AI Innovation Team at Welocalize Lamis Mhedhbi—Machine Translation team lead at Acolad With the panelists’ 87 years of collective experience in the localization industry, the event was jam-packed with tips for integrating MT into localization workflows. Keep reading to discover our key takeaways. Always understand quality expectations Machine translation is not one-size-fits-all. You cannot throw all of your content in and expect the quality of the output to meet all of your needs. Before starting with MT you have to know exactly what outcome you require for the different content types you wish to translate. Setting quality expectations from the get-go will help not only in the MT evaluation process, but it will also help you measure the success of your MT program, and keep costs, quality, timelines under control. You’ll need to communicate these expectations clearly with your MT evaluators and linguists. This will help reduce bias and prevent preferential or unnecessary changes to the MT output which may increase the post-edit distance and skew efficiency metrics. Pro tip: Don’t deploy machine translation without a content type analysis. Create a matrix of your different types of content and what your expectations are for each. Memsource’s MT Quality Estimation feature can also help you understand what quality you can expect by providing quality scores for MT at segment level, reducing the uncertainties related to MT results on new content types/languages. While quality expectations can differ from project to project, it’s also important to note that improving MT engine quality could affect your overall strategy in the future. Light post-editing (LPE), i.e. raw MT that is only modified where absolutely necessary to ensure the output is legible and conveys the meaning of the source, may no longer make sense.“ The quality of MT is going up, and on the other hand, the cost of doing post-editing is going down. The space in which you can play is becoming narrower and narrower,” said Jordi. Shifting the LPE work from professional translators to simply native language reviewers looking for critical errors may replace LPE. A deployment checklist is essential We’ve provided an MT integration checklist here, but the panelists highlighted some additional dos and don’ts: Do: When evaluating MT engines, develop your testing to align with your customer’s needs. You can spend a lot of time testing something the end user cares little about. Do: Test a generic engine first. People don’t always consider the volume of source content that will go through MT. When you’re leveraging translation memory for a high volume of words, and only a small percentage of new words are processed by MT, you end up spending far more on customizing and maintaining your engines than actual savings. “Training an MT engine requires an investment, not only for the cost of the MT provider but also the cost of cleaning up and optimizing the training data,” added Paula. Do: Understand the integration. “If you can’t deliver the engine into your processes in an efficient, ergonomic way, then the quality and the suitability of the engine towards your customer’s needs no longer matters,” said Adam. If you invest too much time into MT providers that aren’t supported by your tech stack, you’ll end up spending extra effort trying to make the MT work for you and end up pushing productivity savings to other parts of the supply chain. Don’t: Simply upload your glossaries to your engine. Employ the help of a linguist to help you clean your glossaries to get them ready for MT. Glossaries only take you so far when training an engine, with poorly prepared glossaries potentially creating a lot of noise and ruining the output. MT is not a “set it and forget it” solution There is a common misconception that once you have set up your MT workflow, you can sit back and relax and marvel at your efficiency gains. This is not true. At the start of your MT journey, you’ll spend time and resources on set-up and engine training but it’s easy to forget that MT engines are constantly interacting with new data. The results are going to change over time so you need to understand how your engines are handling the input and how your human editors are interacting with the output. Capturing this data, as well as post-editing data, will help identify possible under-editing and over-editing and enable you to continuously fine-tune your processes, keeping the MT lifecycle healthy. Pro tip: One way to make it easier to adapt your workflows to new performance data is to use advanced MT management features. For example, our MT management hub, Memsource Translate, can ensure that you always use the best performing engine for your content using its AI-powered MT Autoselect feature. Presenting MT ROI is all about use cases For a long time, MT had a bad reputation resulting from poor quality MT output. But thanks to neural MT, there has been a seismic shift in the perception of MT quality. If you are trying to get stakeholders on board who are not localization savvy, you may still have to work on removing the fear that MT is synonymous with poor quality. From there, presenting MT ROI is all about use cases and the particular need of the stakeholder. Whether it’s light post-editing or full post-editing, custom engines or generic, you need to find the best use of the budget and how to get the most out of it. “Essentially, it goes back to data, understanding what your stakeholders’ goals are, and then presenting the best workflow that covers the full spectrum of their needs,” added Jordi. Pro tip: Basing your MT ROI on translation volumes is a common faux pas. Machine translation is not the only technology that could be contributing towards higher productivity. Translation memories produce the bulk of translation output and you’ll end up with a smaller, perhaps seemingly insignificant, volume actually translated by MT. Capture all the data Data is integral to a successful localization workflow. This was made especially clear as, regardless of the question, data was part of almost every answer given by the panelists. Evaluating an MT engine before implementation? You need to collect data. Want to improve the quality of the MT output? Dig into the data. How do you present ROI to stakeholders? Data will come in handy. When should you retrain your engine? Look at the data. Qualitative data is also important. “Collect post-editors feedback regarding the engine quality and do this continuously to adapt to meet productivity goals,” said Lamis. Feedback from clients is also vital for improving engine quality. When asked if you should track MT data for all projects or rather do spot checks, the panelists all agreed capturing all the data would be ideal, as long as there is an efficient way of doing it. “You don’t want to add huge overhead by manually downloading files and scoring them, but if you do it automatically, yes, measure everything,” said Elaine. Pro tip: For advanced users, Memsource offers full access to your performance data through our Snowflake connector. You can track your editing time, post-editing analysis, LQA results, and on-time deliveries. On top of that, we automatically calculate key MT metrics, like BLEU, TER, or chrf3, providing you with multiple ways to measure your performances. Watch the recording of the workshop to get all the insights from the MT experts, and be sure to sign up to hear about the future workshops.
将MT整合到L10n工作流程中:5个关键的收获 机器翻译的采用正在上升,但围绕其好处的问题仍在不断涌现。我们最新的机器翻译研讨会聚集了主要的MT专家,以提供答案和更多信息。 机器翻译的采用率正在增加,但关于MT的好处的问题仍然存在。你如何确保在不牺牲质量的情况下提高效率?如何将MT纳入你的本地化工作流程?如何让所有利益相关者加入进来?我们召集了一批MT专家,在我们最新的机器翻译研讨会上解决这些MT整合问题以及更多问题。 我们众星云集的小组成员包括: Adam LaMontagne-RWS的机器翻译经理 Paula Manzur-Vistatec机器翻译专家 Jordi Macias-VP,Lionbridge的运营副总裁 Elaine O'Curran-Welocalize的人工智能创新团队的高级MT项目经理 Lamis Mhedhbi-Acolad的机器翻译团队负责人 小组成员在本地化行业拥有87年的集体经验,这次活动充满了将MT融入本地化工作流程的技巧。请继续阅读,了解我们的主要收获。 始终了解质量预期 机器翻译不是千篇一律的。你不能把你所有的内容都扔进去,并期望输出的质量能满足你所有的需求。在开始使用MT之前,你必须确切地知道你对你希望翻译的不同内容类型需要什么样的结果。 从一开始就设定质量预期,不仅有助于MT评估过程,而且还能帮助你衡量MT项目的成功,并使成本、质量、时间线得到控制。你需要与你的MT评估员和语言学家明确沟通这些期望。这将有助于减少偏见,防止对MT输出的偏好或不必要的改变,这可能会增加编辑后的距离,并歪曲效率指标。 专业提示:在没有进行内容类型分析的情况下,不要部署机器翻译。创建一个矩阵,列出不同类型的内容以及您对每种内容的期望。Memsource的MT质量评估功能也可以帮助您了解您可以期望的质量,提供段级的MT质量分数,减少与新内容类型/语言的MT结果有关的不确定性。 虽然对质量的期望可能因项目而异,但同样重要的是要注意,提高MT引擎的质量可能会影响你未来的整体战略。轻度后期编辑(LPE),即只在绝对必要的情况下修改原始MT,以确保输出的内容清晰可读并传达来源的意思,可能不再有意义"。MT的质量在上升,而另一方面,做后期编辑的成本在下降。你可以发挥的空间越来越窄,"Jordi说。将LPE工作从专业译员转移到简单的寻找关键错误的母语审校,可能会取代LPE。 部署检查表是必不可少的 我们在这里提供了一份MT整合清单,但专家小组成员强调了一些额外的该做和不该做。 做:在评估MT引擎时,开发你的测试,以符合你的客户的需求。你可以花大量的时间来测试一些最终用户不关心的东西。 做:先测试一个通用引擎。人们并不总是考虑将通过MT的源内容的数量。当你利用翻译记忆来处理大量的单词,而只有一小部分新单词被MT处理时,你最终在定制和维护引擎上的花费远远超过了实际的节省。"训练一个MT引擎需要投资,不仅是MT供应商的成本,还有清理和优化训练数据的成本,"保拉补充说。 做:理解整合。"亚当说:"如果你不能以有效的、符合人体工程学的方式将引擎交付给你的流程,那么引擎的质量和对你的客户需求的适合性就不再重要。如果你在不被你的技术栈支持的MT供应商上投入了太多的时间,你最终会花费额外的努力来使MT为你工作,并最终将生产力的节省推到供应链的其他部分。 不要:只需将你的词汇表上传到你的引擎。请语言学家帮助你清理你的词汇表,使它们为MT做好准备。词汇表在训练引擎时只能发挥这么大的作用,准备不充分的词汇表可能会产生很多噪音,破坏输出。 MT不是一个“设置好就忘了它”的解决方案 有一个普遍的误解,即一旦你建立了你的MT工作流程,你就可以坐下来放松,并惊叹于你的效率提高。这是不正确的。在你的MT之旅开始时,你会在设置和引擎培训上花费时间和资源,但很容易忘记MT引擎在不断与新的数据进行互动。结果会随着时间的推移而改变,所以你需要了解你的引擎是如何处理输入的,你的人类编辑是如何与输出互动的。捕捉这些数据以及编辑后的数据,将有助于识别可能的编辑不足和过度编辑,使你能够不断地微调你的流程,保持MT生命周期的健康。 专业提示:让你的工作流程更容易适应新的性能数据的一个方法是使用先进的MT管理功能。例如,我们的MT管理中心,Memsource Translate,可以确保你总是使用其AI驱动的MT自动选择功能,为你的内容使用性能最好的引擎。 呈现MT ROI完全是关于用例的 长期以来,由于MT输出质量差,MT的名声很不好。但是,由于有了神经性MT,人们对MT质量的看法已经发生了巨大的变化。如果你想让那些不懂本地化的利益相关者加入进来,你可能仍然需要努力消除对MT是质量差的同义词的恐惧。从这里开始,介绍MT的投资回报率是关于用例和利益相关者的特殊需求。无论是轻度后期编辑还是全面后期编辑,定制引擎还是通用引擎,你都需要找到预算的最佳用途以及如何获得最大的收益。"从本质上讲,这又回到了数据上,了解你的利益相关者的目标是什么,然后提出涵盖他们全部需求的最佳工作流程,"Jordi补充说。 专业提示:将你的MT投资回报率建立在翻译量上是一种常见的错误做法。机器翻译并不是唯一可以为提高生产力做出贡献的技术。翻译记忆库产生了大部分的翻译产出,而你最终会得到一个较小的,也许看起来微不足道的,由MT实际翻译的数量。 捕获所有数据 数据是成功的本地化工作流程的组成部分。这一点尤其明显,因为不管是什么问题,数据几乎是小组成员给出的每个答案的一部分。在实施前评估一个MT引擎?你需要收集数据。想提高MT输出的质量?挖掘数据。你如何向利益相关者介绍投资回报率?数据会派上用场。你应该什么时候重新训练你的引擎?看一下数据。 定性数据也很重要。拉米斯说:"收集有关发动机质量的编辑后反馈,并持续这样做,以适应满足生产力目标。客户的反馈对于提高发动机质量也是至关重要的。" 当被问及你是否应该跟踪所有项目的MT数据或宁可做抽查时,小组成员都同意捕捉所有的数据将是理想的,只要有一个有效的方法可以做到这一点。Elaine说:"你不想通过手动下载文件和打分来增加巨大的开销,但如果你自动完成,是的,要测量所有的数据。" 专业提示:对于高级用户,Memsource通过我们的Snowflake连接器提供对您的性能数据的全面访问。你可以跟踪你的编辑时间,编辑后的分析,LQA的结果,以及按时交付。除此之外,我们还自动计算关键的MT指标,如BLEU、TER或chrf3,为您提供多种方法来衡量您的表现。 观看研讨会的录音,了解MT专家的所有见解,并确保注册以听取未来研讨会的消息。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文