Memsource Translate: Interview with Dalibor Frivaldsky, Memsource Chief Technology Officer

想了解最新机器翻译工具——Memsource Translate?请看与Memsource首席技术官Dalibor Frivaldsky的专访

2020-07-23 23:00 Memsource


阅读模式 切换至中文

Memsource Translate: Interview with Dalibor Frivaldsky, Memsource Chief Technology Officer Memsource Translate: Interview with Dalibor Frivaldsky, Memsource Chief Technology Officer Memsource released the newest version of Memsource Translate, a machine translation management solution that is set to change the way you approach MT. To learn more about this game-changing feature, we asked Memsource’s CTO, Dalibor Frivaldsky some questions to learn exactly what Memsource Translate is, how it works, and why you should try it for yourself. Q: Why did Memsource decide to create Memsource Translate and what problem does it solve? A: Machine translation has been integrated with Memsource for quite some time. At the moment we provide integrations with over 30 MT engines and we’re adding a new one almost every month. The MT space is therefore quite fragmented and it’s become very challenging for organizations to evaluate all the options and make an optimal choice. In addition, performance of these MT engines can differ not only across language pairs, but also for different domains in a language pair. The different MT systems also evolve over time, so any evaluation can become obsolete within a year. We wanted to make using MT technology as simple as possible, without the need to go through the complex process of choosing a single MT provider. Memsource Translate tries to solve this problem by picking the optimal MT engine for each individual document you are translating, based on the language combination and domain of the document Q: When should you use Memsource Translate? A: Always! But more seriously, it doesn’t matter if you are just starting with MT or you already have it integrated into your workflows. Memsource Translate allows you to step into the MT world without the need to evaluate all the options out there. You can take advantage of the know-how accumulated by the algorithm over time and work with the most optimal MT engines right away. If you’re already an MT user, you are likely using only one provider. As data shows, that is not an optimal solution if you are working with multiple language pairs or domains. Memsource Translate will understand your content and pick a better MT engine where appropriate. Q: What’s new in the latest version of Memsource Translate? A: The new version brings three important innovations. The MT engine is selected for each individual document, not just for the entire language pair of a project. The algorithm has improved - the system learns about the performance of different MT engines for each language pair and domain of the text continuously, in real time. Domain identification - initially for English source text, but soon for top 10 source languages in Memsource, we can automatically detect and categorize the content into 11 domains, such as Legal, Industrial, Software documentation and so on. This allows the system to work on lower granularity. Support for customizable MT engines has been added as well. With a custom MT engine, we will evaluate it only on your content and compare to the other MT engines. If it does indeed perform better than the other solutions, it will become the most recommended MT engine for you. Q: How does the optimal machine translation selection work? A: Each document that gets post-edited provides feedback to the system - how well did the MT engine perform. The algorithm behind Memsource Translate learns from this data and takes it into account when next recommending an MT engine for a document. The data is collected in real-time, so the recommendation for your next job can already take advantage of feedback from the translation you just finished. The algorithm needs around 50 -100 documents to produce a reliable estimate. Memsource Translate has already processed over 22,000 documents and that number is growing continuously.The more feedback data we have, the more confident the algorithm is that the recommendation is truly the optimal one. At the same time, we know that MT engine quality evolves over time. When we get informed by an MT provider about a new release of their systems that increase the quality, we can nudge the algorithm to explore the MT engine again. But even if that doesn’t happen, the algorithm takes into account feedback data only from the last 6 months, so any change in quality will eventually be noticed and taken into account for future recommendations. We can illustrate the behavior of the learning algorithm with the chart above. In one specific language combination and domain, Amazon Translate is providing the optimal performance compared to the other two MT engines (just a note, Memsource Translate now integrates with more MT systems, the data is taken from a pilot run). With the increased amount of feedback gathered over time (period of roughly two weeks in this case), the algorithm learned about the performance of optimal MT system and became more confident in this knowledge, recommending the MT system predominantly for new documents created for this combination of languages and domain. Q: How easy is it to use Memsource Translate? A: Very easy. Out of the box, you will get access to three MT engines without needing to set up anything apart from having a Memsource account. However, the more MT engines you use, the better. For generic MT engines that don’t come enabled out of the box, you will only be required to provide your API keys or credentials (depending on how the MT engine authorizes the requests). These will then automatically become part of the set from which Memsource Translate recommends the optimal engine for each document. Q: How does a user know whether the optimal best machine translation engine is being used? A: Users can get a general overview of engine performance through our Machine Translation reports, which cover engine performance on a quarterly basis. While not part of the initial release, we are looking into other ways to let the users know which MT engine was picked for each individual document, how much better is it for this particular language combination and domain than the other MT engines the user has enabled, and also potentially let the user know how much better the performance can be by enabling additional MT engines on the platform Q: Is the data shared with MT providers? A: Users don’t have to be concerned about Memsource sharing their data. Memsource Translate does not send any post-edits to the MT providers, all the recommendation and evaluation logic happens within the Memsource environment. If you’re also concerned about sending any source text to an MT provider to get a machine translation, you can disable such provider in Memsource Translate and it will not be recommended for any new jobs you upload. Memsource Translate is available for all Memsource users. Learn how easy it is to start using it here.
想了解最新机器翻译工具——Memsource Translate?请看与Memsource首席技术官Dalibor Frivaldsky的专访 想了解最新机器翻译工具——Memsource Translate?请看与Memsource首席技术官Dalibor Frivaldsky的专访 Memsource发布了最新版本的机器辅助翻译工具——Memsource Translate,旨在改变人们使用机器翻译(MT)的方式。为了解这个具有变革性的特点,我们向Memsource的首席技术官(CTO)达利博尔·弗里沃兹其(Dalibor Frivaldsky)询问了一些问题,以了解Memsource Translate到底是什么,它是如何运行的,以及我们为什么应该亲手操作它。 问:为什么Memsource决定开发Memsource Translate,它能解决什么问题? 答: Memsource与机器翻译的融合已经有很长一段时间了。目前,我们提供的机器翻译(MT)工具已超过30个,几乎每个月都增加一个新工具。因此,MT空间是相当分散的,对于各种组织机构来说,评估所有的MT工具并做出最优选择是十分具有挑战性的。此外,这些MT工具不仅在不同的语言对中表现各异,而且在同一个语言对中的不同领域的表现也不尽相同。MT系统会随着时间的推移而演变,因此对该系统的任何评估都可能在一年内过时。 我们希望尽可能地简化MT技术的使用,从而无须选择某个MT提供商,因为这个过程太复杂了。为了解决这个问题,Memsource Translation试图通过以用户翻译的文档的语言组合和领域为基础,为每个文档选择最佳的MT工具。 问:什么时候应该使用Memsource Translate? 答:一直! 不过,不管你是刚刚开始使用MT,还是已经将它融入到工作流程中,都无关紧要。 Memsource Translate让你无需评估所有的MT工具就能够步入机器翻译的世界。你可以利用算法积累的诀窍,即刻使用最优化的MT工具。如果你已经开始使用MT,很有可能你仅仅使用了一款MT工具。如数据所示,如果你处理的是多个语言对或领域,这就不是一个最佳选择。Memsource Translate能够理解文档内容,适时选择更加匹配的MT工具。 问:最新版本的Memsource Translate有什么新功能? 答:这个全新版本有三个重要的创新点。 为每个文档选择MT工具,而不仅仅是用某个MT工具处理项目的所有语言对。 算法得到了改进。系统不间断地实时了解为每个语言对和领域的推荐的MT工具的性能。 领域识别。最初只能识别英文源文本,但很快就能对Memsource中排名前10的源语言进行识别,可以自动检测内容并将其归类到11个领域,如法律、工业、软件文档等。这使得系统能在更低的粒度级别上运行。 支持添加可定制的MT工具。对于自定义的MT工具,我们只会在你的文档内容里对其进行评估,并且与其他MT工具进行比较。如果它确实比其他工具表现得好,那么它将成为最值得推荐的MT工具。 问:如何选择最优的机器翻译? 答:每一个经过后期编辑的文档都会向系统提供有关MT工具的性能的反馈。Memsource Translate的算法可以从这些数据中学习,并且在为下一份文档推荐MT工具时将其考虑在内。数据是实时收集的,所以为下一份文档推荐MT工具时,Memsource Translate已经可以利用你刚刚完成的翻译的反馈了。该算法需要大约50-100个文档才能形成可靠的估计值。Memsource Translate已经处理了22,000多个文档,数额还在增加,我们获得的反馈数据越多,就越能确定算法推荐的MT工具为最优选择。 同时,MT工具的质量随时间不断提高。当我们得知某个MT提供商的最新版本的系统的质量提高时,我们可以改进算法来再次开发MT工具。就算情况并非如此,算法也只考虑了过去6个月的反馈数据,因此任何质量上的变化最终都会被注意到,并在未来推荐MT工具时考虑在内。 上面的图表可以说明机器学习算法的行为。 在某一特定的语言组合和领域中,与其他两个MT工具相比,Amazon Translate拥有最佳性能(请注意,Memsource Translate现在已融合更多的MT系统,数据来自试点运行)。 随着时间的推移——这种情况下大约是两周,收集的反馈数据不断积累,算法了解到最佳MT系统的性能,对这方面的知识更加肯定,主要为根据这种语言组合以及领域创建的新文档推荐MT系统。 问:使用Memsource Translate容易吗? 答:很容易。就像打开箱子那么简单。只须拥有一个Memsource账号,不需要设置任何其他东西就可以使用三个MT工具。不过,MT工具用得越多越好。对于其他没那么容易获得的通用MT工具,你只需要提供API密钥或凭据(取决于MT工具如何授权请求)即可。然后这些工具将自动纳入Memsource Translation为每个文档推荐的最佳工具中。 问:用户如何知道是否正在使用最佳的机器翻译工具? 答:用户可以通过我们的机器翻译报告获得MT工具性能的大致概况,每个季度会发布MT工具性能的报告。 虽然这不包含在初始版本中,但我们正在研究其他方法,从而让用户知道系统为每个文档选择了哪个MT工具,对于该语言组合和领域,这个MT工具比用户使用的其他工具好多少,并且用户还有可能知道,通过在平台上启用额外的MT工具能够提高多少性能。 问:数据是否与MT提供商共享? 答:用户不必担心Memsource会分享他们的数据。Memsource Translate不会向MT提供商发送任何编辑后的文档,所有的工具推荐和评估都发生在Memsource环境中。如果你还担心Memsource Translate会将源文本发送到MT提供商那里以获得机器翻译,你可以在Memsource Translate中禁用这样的提供商,这样的话,这些提供商就不会被推荐到你上传的新文档中。 Memsource Translation可供所有Memsource的用户使用。 点击此处以了解它的简易操作。

