#AskTheExperts – Terminology and Artificial Intelligence

#专家意见-术语和人工智能

2021-02-02 18:00 GALA

本文共1265个字,阅读需13分钟

阅读模式 切换至中文

Today GALA launches a new series of blog posts, where we ask translation industry experts inside and outside the GALA community for their insights and advice on managing business processes and digital transformation. Do you have a burning or knotty question? Send it to us and we’ll ask our experts. We kick off with a question on terminology and AI: What will the impact of AI be on term mining and, more generally, on terminology management? Given that artificial intelligence is permeating all activities related to translation and content production, we can't help but pause to consider what role AI will have (or maybe already has) in terminology management. Here's what some experts have answered. In my opinion, AI will have a twofold impact on terminology management. On the one hand side, it will make companies more aware of the importance of terminology. Since terminology is the number one factor for improving AI output quality, investing in terminology will have much more tangible benefits than we’ve seen so far. In the past, it was very difficult to calculate objective benefits through sound terminology management, apart from improved efficiency in several different global content processes. However, as direct input into AI and deep learning engines, for example machine translation or chatbots or data mining, improvements through terminology can be expressed in actual numbers and facts. Deep learning projects of any kind require structured and harmonized data, which is exactly what terminology management delivers. In marketing, too, terminology can feed into global SEO rollouts, or social media management. Thus, completely new fields of application and new divisions inside companies now are looking for terminology. This is a game changer, because it turns terminology from a pure cost center into an actual and tangible revenue generator--or at least contributor for companies. On the other side, AI will also play a role in terminology management itself. For starters, not many terminology management systems currently employ functionalities as we know them from authoring tools, such as consistency or style assurance. They will receive those. Also, for terminology creation itself, AI will start playing a role. This might range from automatic definition extraction or generation to finding potential synonyms in a company´s data. AI will help make corporate terminology processes more efficient. For example, machine learning could predict which user groups might have an issue with what terms and channel these accordingly. It will be able to predict the preferred term out of a group of synonyms based on guidelines or previous selection. Or, again, it might help tune terminology extraction tools to predict whether a term candidate will make it through review based on previous data. Terminology management has employed artificial intelligence well before AI became the buzzword it is today. After all, the best terminology extraction engines have always relied on natural language processing (NLP), one of the applications used in AI. At the same time, a granular terminological data collection represents knowledge and is thus the backbone of AI applications, such as machine translation (MT). A recent project of mine involved term mining in preparation of a 9 million-word MT project. Once the terminology was identified and appropriately documented, it supported the human editing process. Texts with consistent use of the correct terminology in turn result in improved TMs and more well-trained MT engines. While we can predict certain improvements in terminology or information extraction from the field of artificial intelligence, the contribution of terminology management to AI applications can be expected to be far greater. The term "artificial intelligence" (AI) is often used to describe computers that mimic cognitive functions normally associated with the human mind, such as learning and problem solving (source: Wikipedia). Today, the more advanced term mining tools already perform a form of AI by using sophisticated algorithms coupled with a reference corpus to identify term candidates that present strong “termhood” or semantic relevance. I wouldn’t say that AI has an impact on term mining. It’s rather the other way around. Term mining has an impact on AI by enabling the effective identification of conceptually-relevant terms, which in turn can be used to support AI applications, such as search engine optimization, automatic content classification, machine translation, and sentiment analysis. As for how AI impacts terminology management, well, the first thing that comes to mind is how online corpora are growing exponentially, and continually evolving search techniques are making it possible to access those corpora like never before. Terminology data itself constitutes the DNA of digital content and so managing and developing terminological resources is like building a toolbox for accessing and leveraging digital content for various end-purposes. The availability of large-scale corpora, corpus analysis tools (such as WordSmith) and (programming languages such as Python) unlock huge opportunities for terminology workers to build highly-performant and multi-purpose terminology resources which in turn aid in the advancement of AI. In this landscape, the job description of the terminologist is changing dramatically. My team provide guidance on word choice to a large organization, and we are tasked to provide that guidance before authors write the first line of content in a project or initiative. This type of proactive terminology management relies primarily on structured processes and close collaboration between terminologists and content owners. Our goal is to enable authors to use the right terms the first time (as opposed to correcting terminology during a review stage), and to have reviewed and validated translated terms available at the beginning of a translation project (as opposed to starting the term translation process at that time). I strongly believe that proactive terminology management that begins at the planning stage is more effective, efficient and results in more consistent content (across all languages) faster than terminology management that relies on term mining late in the content lifecycle. So, while we have very sophisticated term mining software in our toolbox, we do not use this type of technology as part of our standard terminology development process. And I don’t expect that to change any time soon - regardless of any progress made in developing term mining capabilities. To discuss the impact of AI in terminology management we need to better understand the AI engine involved. There are currently no global "all knowing" AI engines. Instead, each one is focused within a certain area and trained with relevant material in that field. The key to obtaining intelligence through human interaction with an engine is for the engine to grasp contextual information. What is this text about? What is the person asking for? Therefore, contextual information, semantic relations between terms and categorization of terminology all play major parts. Another important factor is the use of variants, which some termbases are built to avoid. Humans typically use various ways and expressions to describe the same thing, which means synonyms must be included in the termbase. This also impacts term extraction processes. They must become more intelligent to interpret the context of a term while also being flexible to allow more variants and synonyms in the resulting termbase. Terminology management software also needs to scale up to make the information flow manageable. We are now talking hundreds, if not thousands, of terms suggestions entering the termbase per week. There will need to be workflow support and automation functionality so users may collaborate effectively in updating AI-supporting termbases. In summary, terminology will clearly play a key role in the functioning of AI, and this is already impacting how people are working with their termbases. For more on terminology, visit the GALA Knowledge Center.
今天,GALA推出了一系列新的博客文章,我们向GALA社区内外的翻译行业专家征询他们对管理业务流程和数字化转型的见解和建议。您有一个刻薄或棘手的问题吗?将其发送给我们,我们将询问我们的专家。 我们首先提出有关术语和人工智能的问题:AI对术语挖掘以及更广泛的术语管理有何影响?鉴于人工智能正在渗透与翻译和内容制作相关的所有活动,我们不禁停下来思考一下人工智能在术语管理中将(或可能已经)扮演的角色。这是一些专家的回答。 我认为,人工智能将对术语管理产生双重影响。一方面,它将使公司更加意识到术语的重要性。由于术语是提高AI输出质量的第一大因素,因此对术语的投资将比我们迄今为止看到的更加明显。过去,除了在几个不同的全球内容流程中提高效率之外,通过合理的术语管理来计算客观收益是非常困难的。但是,作为直接输入到AI和深度学习引擎(例如机器翻译或聊天机器人或数据挖掘)的方法,通过术语进行的改进可以用实际数字和事实表示。任何种类的深度学习项目都需要结构化和统一的数据,这正是术语管理所提供的。在市场营销中,术语也可以应用到全球SEO推广或社交媒体管理中。因此,全新的应用领域和公司内部的新部门正在寻找术语。这是一个改变游戏规则的方法,因为它将术语从纯粹的成本中心转变为实际的,有形的收入来源,或者至少是公司的贡献者。另一方面,人工智能还将在术语管理本身中发挥作用。对于初学者来说,目前没有多少术语管理系统采用我们从创作工具中了解到的功能,例如一致性或样式保证。他们将收到这些。同样,对于术语创建本身,人工智能将开始发挥作用。从自动定义提取或生成到在公司数据中查找潜在的同义词,范围可能不等。人工智能将帮助提高公司术语流程的效率。例如,机器学习可以预测哪些用户组可能对哪些术语有疑问,并相应地引导这些术语。它将能够基于准则或先前的选择来预测一组同义词中的首选术语。或者再次,这可能有助于调整e术语提取工具,可根据以前的数据通过审查来预测某个候选词是否会通过。 术语管理早在人工智能成为当今流行语之前就已经采用了人工智能。毕竟,最好的术语提取引擎始终依赖自然语言处理(NLP),这是AI中使用的应用程序之一。同时,精细的术语数据集合代表了知识,因此是AI应用程序(例如机器翻译(MT))的基础。最近的一个矿山项目涉及定期采矿,准备了一个900万字的MT项目。一旦确定了术语并对其进行了适当记录,就支持人工编辑过程。持续使用正确术语的文本反过来会导致TM的改进和MT引擎的训练有素。尽管我们可以预测人工智能领域的术语或信息提取方面的某些改进,但可以预期,术语管理对AI应用的贡献将更大。 术语“人工智能”(AI)通常用于描述模仿通常与人脑相关的认知功能(例如学习和解决问题)的计算机(来源:维基百科)。如今,更先进的术语挖掘工具已经通过使用复杂的算法以及参考语料库来识别具有强烈“术语”或语义相关性的候选词,从而执行一种形式的AI。 我不会说AI对术语挖掘有影响。恰恰相反。术语挖掘通过有效识别概念上相关的术语对AI产生影响,而这些术语又可用于支持AI应用程序,例如搜索引擎优化,自动内容分类,机器翻译和情感分析。关于AI如何影响术语管理,首先想到的是在线语料库如何呈指数增长,并且不断发展的搜索技术使访问前所未有的语料库成为可能。术语数据本身构成了数字内容的DNA,因此,管理和开发术语资源就像建立一个工具箱,用于为各种最终目的访问和利用数字内容。大规模语料库,语料库分析工具(例如WordSmith)和(编程语言,例如Python)的可用性为术语工作者创造了巨大的机会,以建立高性能和多用途的术语资源,从而有助于人工智能的发展。 。在这种情况下,术语专家的职位描述正在发生巨大变化。 我的团队为大型组织提供了单词选择的指导,我们的任务是在作者在项目或计划中写下第一行内容之前提供该指导。这种类型的主动术语管理主要依赖于结构化的过程以及术语学家与内容所有者之间的紧密协作。我们的目标是使作者能够首次使用正确的术语(相对于在审阅阶段中更正术语),以及在翻译项目开始时就对可用的翻译术语进行审查和验证(与开始术语翻译相反)当时的流程)。我坚信,在计划阶段开始的主动术语管理比在内容生命周期中后期依赖术语挖掘的术语管理更有效,更高效,并且(在所有语言中)生成更一致的内容。因此,尽管我们在工具箱中拥有非常完善的术语挖掘软件,但我们并未将此类技术用作我们标准术语开发流程的一部分。而且,我不希望这种情况会很快改变-无论开发术语挖掘功能的进展如何。 为了讨论AI对术语管理的影响,我们需要更好地了解所涉及的AI引擎。当前没有全球“全知”的AI引擎。取而代之的是,每个人都专注于某个领域,并接受该领域相关材料的培训。通过与引擎进行人机交互来获取情报的关键是引擎掌握上下文信息。这段文字是关于什么的?这个人要什么?因此,上下文信息,术语之间的语义关系以及术语的分类都起着主要作用。另一个重要因素是使用变体,某些术语库旨在避免使用变体。人们通常使用各种方式和表达来描述同一事物,这意味着同义词库必须包含在术语库中。这也影响术语提取过程。它们必须变得更加智能,以解释术语的上下文,同时还必须灵活以允许在生成的术语库中使用更多的变体和同义词。术语管理软件还需要扩大规模以使信息流易于管理。现在,我们每周要讨论成百上千的术语建议(如果不是成千上万的话)。将需要工作流支持和自动化功能,以便用户可以有效地协作以更新支持AI的术语库。总而言之,术语显然将在AI的功能中发挥关键作用,这已经在影响人们使用术语库的方式。 有关术语的更多信息,请访问GALA知识中心。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文