Corona Crisis Corpus

“新冠危机”语料库,需要您的支援

2020-03-20 20:10 TAUS

本文共790个字,阅读需8分钟

阅读模式 切换至中文

What a different world we suddenly find ourselves in. Who would have imagined, just two weeks ago, that planes would stay on the ground, that we are no longer queuing up on the autobahn, standing in line for the barista, that all the NBA games, the Eurovision Song Festival and the UEFA Euro 2020 would be canceled. Who would have thought that we, all together, find ourselves in a war with an invisible beast that undermines our normal lives, our markets and our society? Solidarity Wins It is amazing to see how determined we are, as a species you could say, in a time of crisis like this. Draconian measures are being taken in most real-time by governments and local authorities. Economies are shut down and people are being isolated. We understand that this is necessary. Yet, we realize that real solidarity in our societies and communities depends totally on full disclosure and verifiable information. People spend hours searching, reading and studying the latest news on Corona and COVID-19. The virus is at work among us now. How do we resist it, beat it, or live with it? For most people, especially the younger generation, this crisis is an unprecedented experience. It is reassuring to witness, in general, how virtuous behavior prevails, how solidarity seems to be winning for now at least, and how efficiently the necessary measures are being implemented. Most of us realize that things will get worse before they get any better. The coronavirus COVID-19 is now already affecting 176 countries (out of 195) around the world. Winning this battle in the long run depends on constant solidarity, trust, knowledge and understanding among all of our co-citizens on this planet. How Can We Help? TAUS is launching the Corona Crisis Corpus project. We will collect language data specific to virus outbreaks, health conditions and cures, symptoms and medicines, hospitals and treatments, and everything that citizens and patients around the world want and need to know about the coronavirus and our joint effort to vanquish it. We will clean, cluster and organize the data and make them available in the form of bilingual corpora in the TAUS Data Library. MT developers, Language Service Providers and everyone else who is training their own MT engines can come to our site, download these corpora and use them to improve their translation services and systems. We will kick this off with a ‘Corona Starter Kit Corpus’ containing all relevant matches from the existing TAUS Data Cloud. This Corona Starter Kit Corpus will be available in a number of languages fairly soon. We then invite translators and agencies as well as life sciences companies to contribute their own translation memories covering this same domain, so that together we can expand both the volume of good data and the language spread. TAUS will apply the Matching Data service on all the translation memories we receive so as to clean, cluster and organize the data into Corona Crisis Corpora in as many languages as possible. The resulting corpora will all be available on the TAUS Data Library. Good Vibrations We trust that Google, Microsoft, Facebook, Amazon, Systran, Iconic, and dozens of other small and large MT developers around the world will access the Corona Crisis Corpora and very quickly train and optimize their engines in order to help the 4.5 billion internet users and co-citizens on our planet who are searching every day for unbiased, solid information on this life-threatening crisis and find the right content in the languages they can read and understand. If we do this job well, it will send out good vibrations about how our industry can help the world communicate better. Hopefully this effort will continue to reverberate after the dust settles and we return to business as usual. This is a Charity Project Ah, and yes, of course, this is a charity project. TAUS is putting in the labor and infrastructure for free. We are asking all language data contributors to share their data for free. The Corona Crisis Corpora can all be downloaded for free. There is no money at stake at any point in this endeavor. The Rules TAUS proposes the following rules for the Corona Crisis Corpus project:Everyone who has domain-specific data may contribute data.Data should be provided in zipped TMX file format. Contact data@taus.net before sending the data. Data are shared for free (no credits, no monetary returns) Data are used to create specific Corona Crisis Corpora.TAUS offers the available Corona Crisis Corpora at the TAUS Data Library at no cost. Duration of this project is until April 1, 2021. The TAUS Data Terms of Use apply. Contact data@taus.net if you would like to share your medical data.
我们突然发现世界竟如此不同。仅仅两周前,谁会预料到,飞机不再翱翔蓝天,高速路不再拥堵,人们也不再排队买咖啡;NBA所有的赛事、欧洲歌唱大赛和2020欧洲足球锦标赛也不再举行。又有谁会料到,我们所有人,会和这只破坏我们正常生活、市场和社会秩序的“隐形野兽”开战? 团结必胜 在这种危机关头,人类这一物种能够如此坚定不移,这是非常震撼的。政府和地方当局在极其迅速地采取了严厉措施。经济活动停滞,人们自我隔离。我们理解这些都是必要的。然而,我们认识到,社会和社区的真正团结完全取决于完全透明和准确的信息。人们花了数小时时间来搜索、阅读和研究有关冠状病毒和新冠病毒的最新消息。病毒现在在我们中间爆发了。我们该如何与之对抗?消灭它,还是与它共存?对于大多数人,尤其是年轻一代来说,这场危机是一次史无前例的经历。 总的来说,我们很欣慰地看到了无处不在的道德行为,似乎能引领胜利的齐心团结以及高效推行的必要措施。我们中的大多数人都意识到,事情在好转之前会变得更糟。目前,新型冠状病毒已经影响了全球176个国家(全球共有195个国家)。从长远来看,这个星球上所有公民间的团结、信任、知识和理解才是赢得这种战役的关键。 我们如何互助? TAUS 正在启动新冠危机语料库项目。我们将收集与病毒爆发、健康状况和诊治方法、症状和药物、医院和治疗相关的语言数据,以及世界各地的公民和患者想要和需要了解的所有关于新型冠状病毒的信息。我们将清理、整理和组织数据,并在 TAUS 数据库中以双语语料库的形式提供给大家。机器翻译开发人员,语言服务提供商以及任何致力于改进其机器翻译引擎的人士都可以访问我们的网站,下载这些语料库,以改进他们的翻译服务和系统。 我们将以包含来自现有 TAUS Data Cloud 的所有相关匹配的‘ Corpus ’启动此项工作。这个 Corpus 将很快以多种语言提供。 然后,我们邀请译员、代理以及生物科学公司贡献他们在这一领域的翻译记忆库,共同推动优质数据容量,促进语言传播。TAUS 将在我们收到的所有翻译记忆库中提供数据匹配服务,以便清理、汇总以及组织尽可能多的语言数据到新冠危机语料库中。TAUS 数据库会提供该语料库数据。 作用积极 我们相信,Google, Microsoft, Facebook, Amazon, Systran, Iconic以及世界上其他几十个小型和大型的机器翻译开发人员将访问新冠危机语料库,并快速地调试和优化其引擎,帮助我们星球上45亿网民和公民。现在他们每天都在搜索与这场事关生死的危机有关的可靠信息,并在其可以理解的语言中找到正确的信息内容 如果我们做好这份工作,它将为我们行业在帮助世界更好交流中助力。希望这一努力在危机结束后将继续发挥作用。我们也会恢复往日的工作业务。 这是公益项目 当然,这是公益项目。TAUS 不计回报地投入劳动力和基础设施。我们要求所有语言数据贡献者免费分享他们的数据。新冠危机语料库可以免费下载。在这一工作中,任何时候都不存在任何利害关系. 规则 在新冠危机语料库项目中,TAUS倡议遵守以下规则:每个拥有特定领域的数据的人士可以贡献其数据。数据格式为TMX,以压缩包的形式提供。发送压缩包前请与 data @ taus.net 联系。 免费共享数据(无信用、无货币回报) 数据专门用于创建新冠危机语料库。TAUS 免费提供 TAUS 数据数据库和新冠危机数据库 该项目截止日期为2021年4月1日。 TAUS 数据使用条款适用。 如果您想共享您的医疗数据,请与 data @ taus.net 联系。 译后编辑:杨安训(中山大学)

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文