Artificial intelligence and machine translation

人工智能与机器翻译

2022-12-22 22:25 Across

本文共1362个字,阅读需14分钟

阅读模式 切换至中文

For anyone interested in the topic of artificial intelligence (AI), 2022 was a momentous year. It is remarkable what can now be carried out and accomplished with AI. For example, it was announced in September that game designer Jason Allen had won first place in an art competition in the digital art/digitally enhanced photography category with an AI-generated artwork. He used the AI software Midjourney for this, which created the artwork over the course of 80 hours. DALL-E 2 from OpenAI operates according to a very similar principle. Unveiled in April 2022, the software caught the attention of the public because it is also able to artificially create visual images such as photographs, drawings, paintings, 3D visualizations, and more. The software has been freely available for everyone to use since September 2022. For everyone who hasn’t yet seen what the new AI can do, here is a small collection of images created with DALL-E 2: The way the software works is quite simple: Users enter text to describe what they want to see, and the AI quickly delivers four (original!) image versions of the desired motif. The more precisely you describe the subject matter you want to see, the more complex the image becomes. OpenAI’s latest project has been available to the general public for just a few days (as of December 7, 2022) and is at least as ambitious as DALL-E 2. It is a text generator that works in a manner reminiscent of other chatbots, yet is completely different: ChatGPT. You can ask ChatGPT all sorts of questions, and the answers it provides are surprisingly good in most cases. The text generator can even output entire scripts that solve a wide variety of problems in a programming language of your choice. As an example, we asked ChatGPT to explain neural machine translation. Our submission to ChatGPT: “Explain the advantages and disadvantages of neural machine translation. Write a short article about it.” The output is as follows, and you can judge the quality for yourself: This brief excursion into the news of the world of artificial intelligence is meant to underscore our view on the translation industry: In the future, (nearly) every translation will be supported by machine translation (MT), with language experts involved to a greater or lesser extent. Before neural machine translation came along, few people would have agreed with this statement, but it is now widely accepted that the quality of MT is good enough to add real value to the translation process. According to a Gartner study, 75% of the global translation volume will be machine translated by 2025 – and all of us in the translation industry will need to adapt to this new reality. However, two key aspects will determine the efficiency of machine translation: In the area of artificial intelligence and machine translation, a lot has also happened at Across this year that will help face these challenges. In addition to the expansion of interfaces with existing providers, we will be rolling out our own machine translation system at the beginning of 2023: AcrossMT. The output of the MT engines is getting better and better, but in many cases, what it produces cannot be used without modification. This is where there is a role for post-editors, who improve the quality of machine translation. A distinction is made between light and full post-editing. You can find more information on this topic in the article “Post-editing – Better quality for machine translation.” We know how difficult and tedious the work can be for post-editors sometimes because the way in which work is done today is no longer in keeping with the times. To take full advantage of the possibilities of machine translation, there is a need for new tools that are not yet available on the market – but more on that later. Most of us have worked with MT at some point by now, even if only for testing purposes. And in most cases, we most likely were using a generic engine. Although the good results of generic engines cannot be ignored, a customized engine is the preferred solution in most use cases. When an engine is trained with a company’s large translation memory and then a terminology database is imported, the quality of the output is improved even further. On this basis, translators and post-editors can work better, faster, and with greater accuracy. Precisely this data has been available in many companies for years. Companies that are currently using their existing data only for translation memory matches have a great deal of untapped potential at their disposal: The data is a goldmine, as an MT engine trained with such information is extremely valuable. The results can be further customized with additional metadata: style, gender, domain, subject, sentence length, language variant – everything is possible. At Across, we have experienced this paradigm shift at first hand in recent years, and we have also seen it play out among our customers. According to customer interviews, our customers see clear trends in their companies: the orders are getting bigger, efficiency and cost pressures are increasing, translations are being made into more and more languages, and so on. The majority of the customers we interviewed are also certain that MT is the future. However, many still say that they do not use MT in their translation processes. We asked ourselves: What is the problem? The answers we received were clear: companies have too little time to deal with the topic, the topic is too complex, it is hard to get an overview, and support is needed to move forward. Across conducted intensive research to identify potential areas of opportunity for bringing about the perfect interplay of human and machine. We have seen that the potential is definitely there, but in order to take full advantage of it, it is necessary to develop new features for machine translation that MT vendors have not yet brought to the marketplace. One thing is clear: In the future, a translation management system or a CAT tool will only be successful if it is extensively linked to an MT system. However, without our own MT or a close partnership with a specific provider, we will not be able to achieve our goals. We weighed all the options and alternatives, but it was clear to us at the end of our research process: We must and will develop our own MT system – AcrossMT. Why? We want to bring the data in-house, we don’t want a black box or dependencies, and we want to have an impact on data quality and connect the MT to our own systems. This approach will accelerate our processes and ensure better translation quality for our customers. However, our system is not meant to be just another one of many that you connect to the translation management system (TMS) via an application programming interface (API). No, we are developing a completely new product with the MT system at its core. However, by the time this product is launched, we will already be able to offer features that will set us apart: For example, data will be regularly exchanged between AcrossMT and the Across Language Server. The engines will be trained again and again based on this, ensuring that all available data is utilized and that the engines are constantly improving. AcrossMT is a customized MT solution with a primary focus on the technical industry that will be trained with customer data. According to our tests, the results of AcrossMT are measurably higher than those of generic MT solutions. AcrossMT also has a high degree of data security, as no data leaves the company, and the pricing is fair (based only on volume). As a long-standing customer of Across, you are perfectly positioned to use a high-quality MT engine that is trained with your data thanks to crossTank and crossTerm. Since Across relies on its own MT system that is extensively integrated into the TMS, you are in very good hands, both today and in the future. AcrossMT will be available at the beginning of 2023, but feel free to contact us today, as we would be happy to advise you.
对于任何对人工智能(AI)话题感兴趣的人来说,2022年是重要的一年。现在用人工智能可以执行和完成的事情是值得注意的。例如,9月份宣布,游戏设计师杰森·艾伦(Jason Allen)凭借一件人工智能生成的艺术品在数字艺术/数字增强摄影类别的艺术比赛中获得了第一名。为此,他使用了人工智能软件Midjourney,该软件在80个小时内创作了这件艺术品。 OpenAI的DALL-E 2根据非常相似的原理运行。该软件于2022年4月推出,引起了公众的注意,因为它还能够人工创建视觉图像,如照片、素描、绘画、3D可视化等。该软件自2022年9月起免费供所有人使用。对于那些还没有看到新的人工智能能做什么的人来说,这里有一个用DALL-E 2创建的小图片集合: 该软件的工作方式非常简单:用户输入文本来描述他们想要看到的内容,然后人工智能会迅速提供四个(原创!)所需主题的图像版本。你越精确地描述你想看的主题,图像就变得越复杂。 OpenAI的最新项目已经向公众开放了几天(截至2022年12月7日),至少和DALL-E 2一样雄心勃勃。这是一个文本生成器,其工作方式让人想起其他聊天机器人,但完全不同:ChatGPT。你可以问ChatGPT各种各样的问题,它提供的答案在大多数情况下都出奇的好。文本生成器甚至可以用您选择的编程语言输出解决各种问题的完整脚本。 作为一个例子,我们请ChatGPT解释神经机器翻译。我们提交给ChatGPT的文章:“解释神经机器翻译的优缺点。写一篇关于它的短文。”输出如下,你可以自己判断质量: 这个对人工智能世界新闻的简短游览旨在强调我们对翻译行业的看法:在未来,(几乎)每一个翻译都将得到机器翻译(MT)的支持,语言专家或多或少地参与其中。 在神经机器翻译出现之前,很少有人会同意这种说法,但现在人们普遍认为,机器翻译的质量足以为翻译过程增加真正的价值。 根据Gartner的一项研究,到2025年,全球75%的翻译量将由机器翻译完成,我们翻译行业的所有人都需要适应这一新的现实。然而,两个关键方面将决定机器翻译的效率: 在人工智能和机器翻译领域,今年也发生了很多有助于应对这些挑战的事情。除了扩展与现有提供商的接口,我们将在2023年初推出我们自己的机器翻译系统:AcrossMT。 MT发动机的输出越来越好,但在许多情况下,它生产的产品不加修改就无法使用。这就是后期编辑的作用所在,他们提高机器翻译的质量。轻度后期编辑和完全后期编辑是有区别的。您可以在文章“后期编辑-更好的机器翻译质量”中找到关于这个主题的更多信息。 我们知道后期编辑的工作有时是多么困难和乏味,因为今天的工作方式不再符合时代。为了充分利用机器翻译的可能性,需要市场上还没有的新工具——但以后会有更多。 到目前为止,我们大多数人都在某个时候使用过MT,即使只是为了测试目的。在大多数情况下,我们很可能使用通用引擎。尽管通用引擎的良好效果不容忽视,但在大多数用例中,定制引擎是首选解决方案。 当使用公司的大型翻译记忆库训练引擎,然后导入术语数据库时,输出的质量会进一步提高。在此基础上,翻译和后期编辑可以更好、更快、更准确地工作。 正是这些数据在许多公司已经存在多年了。目前仅将现有数据用于翻译记忆库匹配的公司拥有大量未开发的潜力:数据是一座金矿,因为用这些信息训练的机器翻译引擎非常有价值。结果可以用额外的元数据进一步定制:风格、性别、领域、主题、句子长度、语言变体——一切皆有可能。 在Cross,我们近年来亲身经历了这种范式转变,我们也看到了这种转变在我们的客户中的体现。根据客户访谈,我们的客户看到了他们公司的明显趋势:订单越来越大,效率和成本压力越来越大,翻译成越来越多的语言,等等。我们采访的大多数客户也确信MT是未来的发展方向。然而,许多人仍然说他们在翻译过程中不使用机器翻译。 我们问自己:问题出在哪里?我们得到的答案很明确:公司处理这个话题的时间太少,这个话题太复杂,很难得到一个概览,需要支持才能前进。 Across进行了深入的研究,以确定实现人类和机器完美互动的潜在机会领域。我们已经看到潜力是肯定存在的,但是为了充分利用它,有必要为机器翻译开发机器翻译的新功能,而机器翻译供应商还没有将这些功能推向市场。有一点是明确的:在未来,翻译管理系统或CAT工具只有在广泛链接到机器翻译系统的情况下才会成功。然而,如果没有我们自己的MT或与特定提供商的密切合作,我们将无法实现我们的目标。 我们权衡了所有的选择和替代方案,但在研究过程结束时,我们很清楚:我们必须也将会开发我们自己的MT系统——AcrossMT。为什么?我们希望将数据带入内部,我们不希望出现黑盒或依赖关系,我们希望对数据质量产生影响,并将MT连接到我们自己的系统。这种方法将加快我们的流程,并确保为我们的客户提供更好的翻译质量。 然而,我们的系统并不仅仅是通过应用程序编程接口(API)连接到翻译管理系统(TMS)的众多系统中的一个。不,我们正在开发一种以MT系统为核心的全新产品。然而,到该产品推出时,我们已经能够提供使我们与众不同的功能:例如,AcrossMT和跨语言服务器之间将定期交换数据。引擎将在此基础上一次又一次地训练,确保所有可用的数据都得到利用,并且引擎在不断改进。 AcrossMT是一个定制的MT解决方案,主要关注将使用客户数据进行培训的技术行业。根据我们的测试,AcrossMT的结果明显高于通用MT解决方案的结果。AcrossMT还具有高度的数据安全性,因为没有数据离开公司,并且定价公平(仅基于数量)。 作为Cross的长期客户,由于crossTank和crossTerm,您完全有能力使用用您的数据训练的高质量MT引擎。由于Across依赖于它自己的MT系统,该系统广泛集成到TMS中,因此无论是现在还是将来,您都可以得到很好的照顾。 AcrossMT将于2023年初推出,但请随时联系我们,我们很乐意为您提供建议。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文