Looking into Parliament

展望议会

2023-12-06 18:50 CLARIN

本文共688个字,阅读需7分钟

阅读模式 切换至中文

Written by Johanna Berg This autumn, CLARIN Ambassador Johanna Berg arranged a CLARIN European Research Infrastructure Consortium See: https://research-and-innovation.ec.europa.eu/strategy/strategy-2020-2024/our-digital-future/european-research-infrastructures/eric_en workshop at the Swedish parliament, focusing on the opportunities for new research on parliamentary data opened up by several ongoing infrastructure projects. The Swedish initiatives are also linked to broader European collaboration structures in the CLARIN ParlaMint project. It is important that people in charge of safeguarding the huge amounts of text data in museums and archives understand the fast and dynamic development in Natural Language Processing See: http://en.wikipedia.org/wiki/Natural_language_processing methods, so that they can make sure their important collections stay relevant to research. Parliamentary data gives a good example. The Swedish records are voluminous and coherent from the 1860s. They also fall outside of copyright, and thus serve to give a good picture of what can be done with text resources under the best of circumstances. On top of that the actual debates in parliament cover a broad spectrum of topics, and the minutes may be of interest to researchers from many fields. This CLARIN ERIC event was co-organised with the Swerik project (Swedish Riksdag 1867–2022: An Ecosystem of Linked Open Data) and the Riksdag Library. Some 40 researchers gathered, representing more than 10 universities and several disciplines connected to digital humanities: economic history, intellectual history, linguistics/NLP, media/communications, political science, speech, statistics and more. The workshop opened with a couple of presentations from an infrastructure perspective. The Riksdag administration presented their contemporary work on open data, and how to best make it work both in-house and outside. Then we got glimpses from two ongoing projects building research infrastructure on data from the Swedish parliament: Swerik and Roll-call votes. Måns Magnusson, assistant professor in statistics, Uppsala University, discussed the challenges in dealing with such a huge amount of data. They can be met only by machine learning, statistical processing and iterative curation on the go. Further input on the work was shared by Fredrik Mohammadi Norén, assistant professor in media and communication, Malmö University, telling us about the annotation work going on, and metadata linking to Wikidata. Jan Theorell, professor of political science, Stockholm University, is leading another independent project that will neatly complement the Swerik. He is building a complete and linked dataset of 37 000 roll-call votes in the Swedish parliament (from 1925 onwards), to open this crucial information source up for further research with digital methods. From this starting point in infrastructure, we proceeded to look deeper into a few of the research projects making actual use of the free access to parliamentary data, and often complementing it also with other text resources. Claes Ohlsson, associate professor in Swedish, Linnaeus University, shared findings from ongoing research into Market Language  Over Time, an interdisciplinary project combining corpus linguistics with historical discourse analysis. Nina Tahmasebi, associate professor in NLP, University of Gothenburg, shared findings from Change is Key, discussing semantic language change over time as it can be detected by text-based AI. Pelle Snickars, professor of digital cultures, Lund University, shared some experiences from the Westac (Welfare State Analytics) text mining project and presented its close connections to the Swerik infrastructure. Jens Edlund, professor in speech communication, KTH - Royal Institute of Technology, works on SweTerror, together with, among others, Mats Fridlund, associate professor and deputy director of GRIDH (Gothenburg Research Infrastructure in Digital Humanities). They were both present to tell us about ongoing research into the Swedish discourses on ‘terror’, and how they have changed over time. They are using a mixed-methods approach and are building on both text and speech data. The event was closed with an inspiring discussion on further possibilities. A recurring theme was the value of building a research infrastructure in close connection with research initiatives testing it. We also touched on the importance of building bridges between the research community and the rich sources of ‘found data’ in the Galleries, Libaries, Archives, Museums sector comprising galleries, libraries, archives and museums. There is still much to be done!
作者:Johanna Berg 今年秋天,联合国驻阿富汗大使约翰娜·伯格安排了一次联合国驻阿富汗大使会议, 欧洲研究基础设施联盟 请访问:https://research-and-innovation.ec.europa.eu/strategy/strategy-2020-2024/our-digital-future/european-research-infrastructures/eric_en 在瑞典议会举办的一次研讨会上,与会者重点讨论了几个正在进行的基础设施项目为议会数据的新研究开辟的机会。瑞典的举措还与欧洲在“ParlaMint”项目中更广泛的合作结构相联系。重要的是,负责保护博物馆和档案馆中大量文本数据的人必须了解 自然语言处理 请访问:http://en.wikipedia.org/wiki/Natural_language_processing 方法,这样他们就可以确保他们的重要收藏与研究相关。 议会数据就是一个很好的例子。自19世纪60年代以来,瑞典的记录数量庞大,内容连贯。它们也不属于版权范围,因此可以很好地说明在最好的情况下可以对文本资源做些什么。最重要的是,议会的实际辩论涵盖了广泛的主题,会议记录可能会引起许多领域研究人员的兴趣。 本次活动由Swerik项目(瑞典议会1867-2022:关联开放数据生态系统)和议会图书馆共同组织。大约40名研究人员聚集在一起,代表了10多所大学和与数字人文相关的几个学科:经济史,思想史,语言学/NLP,媒体/通信,政治学,演讲,统计学等等。 研讨会从基础设施的角度进行了几次演示。瑞典议会政府介绍了他们在开放数据方面的当代工作,以及如何最好地使其在内部和外部发挥作用。然后,我们从两个正在进行的项目中看到了瑞典议会数据的研究基础设施:Swerik和唱名投票。 乌普萨拉大学统计学助理教授Måns Magnusson讨论了处理如此大量数据的挑战。它们只能通过机器学习、统计处理和迭代管理来满足。马尔默大学媒体与传播助理教授Fredrik Mohammadi Norén分享了对这项工作的进一步投入,告诉我们正在进行的注释工作,以及链接到维基数据的元数据。斯德哥尔摩大学政治学教授Jan Theorell正在领导另一个独立项目,该项目将巧妙地补充Swerik。他正在建立一个完整的和链接的数据集,其中包括瑞典议会(从1925年开始)的37000张唱名投票,以开放这个关键的信息源,以便通过数字方法进行进一步的研究。  从这个基础设施的起点出发,我们继续深入研究一些实际利用免费获取议会数据的研究项目,并经常用其他文本资源对其进行补充。 林奈大学瑞典语副教授Claes Ohlsson分享了正在进行的市场语言研究的结果,这是一个将语料库语言学与历史话语分析相结合的跨学科项目。 哥德堡大学NLP副教授Nina Tahmasebi分享了Change is Key的研究结果,讨论了语义语言随着时间的推移而变化,因为它可以被基于文本的AI检测到。隆德大学数字文化教授Pelle Snickars分享了Wealthy(福利国家分析)文本挖掘项目的一些经验,并介绍了其与Swerik基础设施的密切联系。Jens Edlund,KTH -皇家理工学院语音通信教授,与其他人一起研究SweTerror,Mats Fridlund,副教授兼GRIDH(哥德堡数字人文研究基础设施)副主任。 他们都出席了会议,告诉我们正在进行的研究瑞典语话语的“恐怖”,以及他们如何随着时间的推移而变化。他们使用混合方法,并建立在文本和语音数据的基础上。 活动结束时,就进一步的可能性进行了令人鼓舞的讨论。一个反复出现的主题是建立一个与测试它的研究计划密切相关的研究基础设施的价值。我们还谈到了在研究界和“发现数据”的丰富来源之间建立桥梁的重要性。 画廊、图书馆、档案馆、博物馆 部门包括画廊、图书馆、档案馆和博物馆。还有很多事情要做!

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文