Has AI Become Multilingual? How New Advances Could Be Pushing Past Language-specific Information Retrieval

AI已经变成多语言了吗?新进展如何推动过去的特定语言信息检索

2020-11-14 04:40 CSOFT

本文共600个字,阅读需6分钟

阅读模式 切换至中文

When you enter a piece of text in Google Translate using the ‘Detect language’ feature, an algorithm able to recognize the belonging of specific words and patterns to various world languages selects one of many linguistic frameworks at its disposal, then begins the more linear natural language processing (NLP) task of translating your content into the target language. As magical as that is, it is easy enough to recognize that Google Translate is a multilingual whiz only for this first step of cuing text for translation, after which it is essentially bilingual, working language pair by language pair to go back and forth between two given languages. What if AI could be truly multilingual, though, with the ability to make assorted connections spanning a broad knowledge of the world sourced from hundreds of languages? What if, like multilingual people, AI could hear you say something in English and think of something it knows by way of Spanish, then tell you about it?   Now, it appears that may not be particularly farfetched, following reports that Google researchers have applied what is known as a language-agnostic knowledge base in the NLP process called entity linking, in which entities like words are linked to attributable facts or information about them. The fact that this knowledge base is language-agnostic means that when it receives an input entity in English, the algorithm at work will not only look for references to it in English, but rather for meaningful references to its semantic equivalents in other languages. The key entity in question seems not to be a token of language, but rather the essential information that we try to carry between languages when translating text and speech.  So far, it is difficult to say how this research could transition into real world practice, but at least one service that it likely benefits is search, enhancing the diversity of information that engines can learn to apply to generating better results and user experiences. Most generally, advances like these seem to bode well for the globalization of information in an age that often finds us at a knowledge deficit when it comes to tackling challenges of business or even global events like the 2020 pandemic. As we have learned, it is not always enough for information simply to exist on record, and with AI as one of many tools that people have sought to utilize in seeking patterns and insights in global data, it is compelling to imagine one that can understand relationships in information spanning multiple languages – in real terms, diverse global sources.   Looking for analogies within language services, the concept of a knowledge base sounds remarkably similar to existing concepts from machine translation and terminology management, despite hailing from a fairly separate realm of the (R&D) world. Without reaching too far from the current limits of machine translation, advancing AI may have implications for improving processes for TM and glossary management, for instance by proposing possible word matches when entering a new language. When multiple possible choices are possible, questions of style and culture can often prove decisive in selecting standard terms and definitions in a TM Bank, and software that can comprehensively reference meaning across languages is more likely to assist in these processes than software that simply sees words.   As fascinating new possibilities emerge in NLP and other AI fields, CSOFT’s technology-driven translation processes continue to incorporate industry-leading tools and processes to ensure the quality and efficiency of translation projects. Learn more about our translation technologies and global network of linguists, subject matter experts, and engineers at csoftintl.com!
当你使用“检测语言”功能在Google Translate中输入一段文本时,一个能够识别特定单词和模式属于不同世界语言的算法将从众多语言框架中选择一个,然后开始执行更为线性的自然语言处理(NLP)任务,将你的内容翻译成目标语言。虽然这很神奇,但是很容易识别出Google translation是一个多语言的天才,这仅仅是提示文本进行翻译的第一步,之后它基本上是双语的,一对一对的工作语言在两种给定的语言之间来回。不过,如果人工智能真的能够使用多种语言,能够在数百种语言的广泛知识中建立各种联系,那又会怎样呢?如果像多语种的人一样,人工智能可以听到你用英语说一些东西,然后用西班牙语想到它知道的东西,然后告诉你它会怎样呢? 现在看来,这可能并不是特别牵强,因为有报道称,谷歌研究人员在NLP过程中应用了所谓的语言不可知论知识库,称为实体链接,在这个过程中,像单词这样的实体与可归因于它们的事实或信息相关联。这个知识库是语言不可知的,这意味着当它接收到一个英文输入实体时,工作中的算法不仅会查找英文中对它的引用,而且会查找其他语言中对它的语义等同项的有意义的引用。所讨论的关键实体似乎不是语言的表征,而是我们在翻译文本和语音时试图在语言之间携带的基本信息。 到目前为止,还很难说这项研究将如何转化为现实世界的实践,但至少有一项服务可能受益于它,那就是搜索,它增强了信息的多样性,引擎可以学习如何应用这些信息来生成更好的结果和用户体验。最普遍的是,这些进步似乎预示着信息全球化的到来,在这个时代,我们在应对商业挑战,甚至像2020年大流行病这样的全球性事件时往往处于知识匮乏的状态。正如我们所了解到的那样,信息仅仅存在于记录中并不总是足够的,而人工智能作为人们在全球数据中寻求模式和洞察力的众多工具之一,很有必要设想一种能够理解跨越多种语言的信息关系的工具--实际上,是多样化的全球来源。 在语言服务中寻找类比,知识库的概念听起来与机器翻译和术语管理的现有概念非常相似,尽管它们来自于(R&D)世界的一个相当独立的领域。在不超出机器翻译当前极限的情况下,人工智能的发展可能会对改进TM和词汇表管理过程产生影响,例如在输入一种新语言时提出可能的单词匹配。当有多种可能的选择时,风格和文化的问题通常证明是在TM库中选择标准术语和定义的决定性因素,而能够全面引用跨语言含义的软件比只看到单词的软件更有可能在这些过程中起到帮助作用。 随着NLP和其他AI领域出现引人入胜的新可能性,Csoft的技术驱动翻译流程不断融入业界领先的工具和流程,确保翻译项目的质量和效率。请访问csoftintl.com,了解更多关于我们的翻译技术和语言学家,主题专家和工程师的全球网络!

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文