AI for Multilingual Europe – Why language data matters more than ever

多语言欧洲人工智能——为什么语言数据比以往重要

2022-10-24 12:00 ELRC-欧洲语言资源协同化

本文共419个字,阅读需5分钟

阅读模式 切换至中文

ELRC’s vision has always been to contribute to a true digital single market where all EU citizens can access information irrespective of the language they speak. With the support of AI, societal challenges can be addressed: we can think of environment, health, or crisis response, but AI is also involved technologies such as Machine Translation, Speech Recognition or Fake News Detection that allow us to communicate across borders, to dictate text messages on our mobile phones, and to verify information sources through fake news detectors – only to name a few. Given the increasing importance of AI and Language Technologies (LT) across all European countries and sectors, the third edition of the ELRC White Paper focusses on the role of language technology and language resources, both within public administrations and small to medium-sized enterprises, while taking into account recent developments, as well as AI-related national regulations. During this investigation, ELRC gained important new insights into the value and status quo of language-centric AI, which actually changed since 2019. For instance, Machine Translation (MT) has found its way into the daily work life of public administrations. In 2022, only 6% of the participating organisations didn’t use MT at all. At the same time, the use of Computer-Assisted Translation (CAT) Tools has massively increased. Above this, significant changes on policy level have occurred, including regarding actual translation and data sharing practices in the participating organisations in comparison to 2019. Also, the ELRC White Paper illustrates the latest developments and approaches to sustainable language data sharing in SMEs and public services. On the one hand, the circumstances that were found to negatively impact or limit the sharing of language data in Europe in 2019 still exist. On the other hand, several additional approaches were mentioned in 2022. The surveyed participants identified six major challenges that organisations involved in the preparation and sharing of language data face in 2022 and beyond, e.g., the development of LT for European less-resourced languages and the lack of competent specialists. Last but not least, the White Paper includes an updated Country Profile for each participating CEF country, that provides the latest insights into the country’s digital and language policy, data collection efforts for LT/AI, major networks, projects and key players related to LT, challenges of sharing language data and many more. The ELRC White Paper will be published in November, and of course we will give more details on how to get access to the full online version in the next ELRC Newsletter. Stay tuned!
ELRC的愿景始终是为一个真正的数字单一市场做出了贡献,在这个市场上,欧盟公民都可以访问信息,无论他们使用何种语言。在人工智能的支持下,社会挑战也会得到解决:我们可以考虑环境、健康或如何应对危机,但人工智能还涉及机器翻译、语音识别或假新闻检测等技术,这些技术允许我们跨境交流、在手机上口述短信、,以及通过假新闻检测器核实信息来源——仅举几个例子。鉴于人工智能和语言技术(LT)在所有欧洲国家和部门的重要性日益增加,ELRC白皮书第三版更重视语言技术和语言资源在公共行政部门和中小企业中的作用,同时考虑到最近的发展以及与人工智能相关的国家法规。 在这项调查中,ELRC对以语言为中心的人工智能的价值和现状获得了重要见解,自2019年开始,这一价值和现状实际上发生了变化。比如,机器翻译(MT)已经进入公共行政部门的日常工作。到2022年,只有6%的参与组织根本没有使用MT。与此同时,计算机辅助翻译(CAT)工具的使用也大幅增加了。除此,与2019年相比,政策层面发生了巨大变化,包括参与组织的实际翻译和数据共享实践。 此外,ELRC白皮书阐述了中小企业和公共服务中可持续语言数据共享的最新发展和方法。一方面,2019年被发现对欧洲语言数据共享产生负面影响或限制的情况仍然存在。另一方面,2022年还提到了其他几种方法。受调查的参与者确定了2022年及以后参与语言数据准备和共享的组织面临的六大挑战,例如,为资源较少的欧洲语言开发LT以及缺乏称职的专家。 最后但并不是最不被重视的点,白皮书包括每个参与CEF国家的最新国家情况,其中提供了该国数字和语言政策、LT/AI数据收集、与LT相关的主要网络、项目和关键的参与者、共享语言数据的挑战等方面的最新消息。 ELRC白皮书将会在11月发布,当然,我们会在下一期ELRC通讯中详细地介绍如何获取完整的线上版本。敬请期待!

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文