Corona Datasets Used by Google, Naver Labs and University of Catalonia


2021-02-12 20:50 TAUS


阅读模式 切换至中文

In an effort to help battle the corona crisis from a language and information access perspective, TAUS coordinated an industry collaboration effort to gather translation memories covering this domain. The result was six datasets containing a total of 3,403,681 segments in the following language pairs: English-French, English-German, English-Spanish, English-Italian, English-Russian, and English-Chinese. To address the information availability in the time of crisis with the help of sufficient, in-domain language data here we present 3 use cases by Google, Naver Labs and University of Catalonia. Each with their unique studies based on the TAUS Corona Datasets share their detailed results. hbspt.cta._relativeUrls=true;hbspt.cta.load(2734675, '257d89de-94dd-4030-ba09-77f03928e445', {});
为了从语言和信息访问的角度来帮助对抗corona危机,TAUS协调了一项行业合作努力,以收集覆盖该领域的翻译记忆。 结果是六个数据集,包含了以下语言对中的总共3,403,681个片段:英语-法语,英语-德语,英语-西班牙语,英语-意大利语,英语-俄语和英语-汉语。 为了解决危机时期的信息可用性问题,借助足够的领域内语言数据,我们在这里给出了Google,Naver实验室和加泰罗尼亚大学的3个用例。每个人都根据TAUS日冕数据集进行了独特的研究,分享了他们的详细结果。 hbspt.cta._relativeURLS=true;hbspt.cta.load(2734675,'257D89DE-94DD-4030-BA09-77F03928E445',{});

