Language Technologies for Polish

波兰语语言技术

2022-10-24 12:00 ELRC-欧洲语言资源协同化

本文共537个字,阅读需6分钟

阅读模式 切换至中文

In September, a successful 3rd ELRC Workshop took place in Warsaw at the Institute of Computer Science of the Polish Academy of Sciences, gathering highly engaged audience from all across Poland both onsite and online. Speakers representing various sectors, stakeholders groups, institutions, cities, and even countries gave a comprehensive overview of key aspects regarding the Polish LT and language data landscape, the EC language tools and the plans of the European Commission for the creation of the Common European Language Data Space, selected use cases, as well as other inspiring European experiences. The panelists engaged a lively discussion on language data availability and issues related with access to data from various perspectives – of public sector, academia, industry and LSPs – giving also ideas on how to improve data accessibility and sharing. The workshop was preceded by a conference of GRAI (Working Group on Artificial Intelligence) on 13 September, where the first year of the Group’s activities was summarised. The participants underlined the importance of language data for the development of language technologies in the national language, and will advocate for giving a higher priority to this topic in the Group’s further works, in particular within its sub-group on data. The Polish ELRC PS NAP Anna Kotarska is expected to start a closer collaboration with GRAI in 4Q2022. In the meantime, she also continues to deliver presentations on LT at several conferences of Polish translation associations in the context of the inflow of Ukrainian war refugees to Poland and the potential of LT to overcome communication barriers whilst providing linguistic assistance to those in need, advocating for the use of European language tools. Language Technologies for good One of the most interesting examples of using LT for good is the solution developed by Samurai Labs for cyberbullying detection for all languages, which was presented at the International Conference on Inline Harm Prevention in September 2022. The event gathered professionals from Artificial Intelligence, Natural Language Processing and psychology fields to share best practices and to exchange ideas on pioneering the neuro-symbolic approach to AI. Experts from related domains covered the findings in countering online verbal abuse and automatic cyberbullying detection as well as analyses of suicidal ideation content and current status of moderation on social media. The keynote speech was given by Michał Ptaszyński, Associate Professor at Kitami Institute of Technology in Japan. He underlined the difficulty to develop efficient cyberbullying detection methods for all languages, especially due to the lack of data. However, multilingual Large Language Models (mBERT, XLM-R) have been tested to check to what extent data from resource-rich languages can be leveraged to produce high-quality results in other languages, gaining very promising first results. All the models created during this study will be released for free later this year. Not surprisingly, Michael Wroczynski, Founder and CEO of Samurai Labs, but also a medical doctor and a therapist, is the laureate of “Leaders of the Future” by Forbes Polska Magazine (October Edition of Forbes Polska Magazine, Page 186), and Samurai Labs has been listed in the ranking of 20 companies that change Poland and the world for the better. Links:The third ELRC Workshop in PolandPodsumowanie konferencji GRAI International Conference on Online Harm Prevention 2022
9月,第三届ELRC研讨会在华沙波兰科学院计算机科学研究所成功举行,现场和在线聚集了来自波兰各地的高度参与的观众。代表不同部门、利益相关者团体、机构、城市甚至国家的发言者全面概述了波兰LT和语言数据景观的关键方面、EC语言工具和欧盟委员会创建欧洲共同语言数据空间的计划、选定的用例以及其他鼓舞人心的欧洲经验。小组成员从公共部门、学术界、工业界和语言服务提供商等不同角度热烈讨论了语言数据的可用性以及与获取数据相关的问题,并就如何改善数据的可获取性和共享提出了意见。 在研讨会之前,9月13日举行了人工智能工作组(GRAI)会议,总结了该工作组第一年的活动。与会者强调语文数据对发展国家语文语文技术的重要性,并将提倡在工作组的进一步工作中,特别是在其关于数据的小组内,给予这一专题更高的优先地位。 波兰ELRC PS NAP Anna Kotarska预计将于2022年第四季度开始与GRAl进行更密切的合作。与此同时,她还继续在波兰翻译协会的几次会议上发表关于语言翻译的演讲,涉及乌克兰战争难民涌入波兰的背景,以及语言翻译在克服沟通障碍方面的潜力,同时为有需要的人提供语言援助,倡导使用欧洲语言工具。 好的语言技术 使用LT的最有趣的例子之一是武士实验室为所有语言的网络欺凌检测开发的解决方案,该解决方案于2022年9月在国际内联伤害预防会议上提出。该活动聚集了来自人工智能、自然语言处理和心理学领域的专业人士,分享了最佳实践,并就开创神经符号方法进行了交流。来自相关领域的专家讨论了在反击在线言语虐待和自动网络欺凌检测方面的发现,以及对自杀意念内容的分析和社交媒体上的审查现状。 主题演讲由日本北美理工大学副教授Michaova ptaszyzynski发表。他强调了开发针对所有语言的有效网络欺凌检测方法的困难,特别是由于缺乏数据。但是,已经测试了多语言大型语言模型(mBERT, XLM-R),以检查资源丰富的语言中的数据在多大程度上可以用于用其他语言生成高质量的结果,并获得非常有希望的初步结果。在这项研究中创建的所有模型都将在今年晚些时候免费发布。 毫不奇怪,武士实验室的创始人兼首席执行官迈克尔·沃钦斯基(Michael Wroczynski),同时也是一名医生和治疗师,是福布斯波兰杂志(Forbes Polska Magazine 10月版,第186页)“未来领袖”的桂冠获得者,武士实验室也被列入了20家改变波兰和世界的公司的排名。 链接:PolandPodsumowanie konferencji第三届ELRC研讨会GRAI国际在线伤害预防会议2022

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文