Google Releases Dataset to Address Gender Bias

谷歌发布数据集解决性别偏见

2021-07-15 02:36 multilingual

本文共371个字,阅读需4分钟

阅读模式 切换至中文

In an effort to address the gender bias in its neural machine translation (NMT) technologies, Google has recently released a new dataset that appears to be able to improve the rate at which Google Translate accurately translates gendered language. “One research area has been using context from surrounding sentences or passages to improve gender accuracy,” reads a recent blog post from the company’s AI team. “This is a challenge because traditional NMT methods translate sentences individually, but gendered information is not always explicitly stated in each individual sentence.” In late June, four researchers at Google published a dataset called the Wikipedia Translated Biographies dataset, which includes a collection of Wikipedia entries on a person (identified as male or female), rock band, or sports team (the latter two are considered to be genderless). According to Google, the new dataset appears to be able to significantly improve gender access, though there’s still work to be done. “It’s worth mentioning that by releasing this dataset, we don’t aim to be prescriptive in determining what’s the optimal approach to address gender bias,” the team writes. “This contribution aims to foster progress on this challenge across the global research community.” In the blog post, Google gives an example of a Spanish paragraph whose subject is female, however the subject is not explicitly mentioned in every sentence of the paragraph, due to the fact that Spanish is a pro-drop language and does not always include subjects in every sentence — thus, the translation engine could potentially mistranslate such sentences into English using masculine pronouns, rather than the correct, feminine ones (or vice versa). When the Wikipedia Translated Biographies dataset was used, Google Translate was able to more frequently produce translations using the accurate gender pronouns. Back in April, MultiLingual reported on issues with the gender bias in Google’s translation engine, after an onslaught of social media users noticed issues with how Google Translate translated non-gendered language into gendered languages. Oftentimes, such translations reflected stereotypical depictions of gender roles — i.e., translating a non-gendered pronoun from Finnish into English as “he” when associated with the word “doctor” but translating the same pronoun as “she” when associated with the word “teacher.”
为了解决其神经机器翻译(NMT)技术中的性别偏见,谷歌最近发布了一个新的数据集,该数据集似乎能够提高谷歌翻译准确地翻译性别语言的速率。 该公司人工智能团队最近的一篇博文写道:“一个研究领域一直在利用周围句子或段落的上下文来提高性别准确性。” “这是一个挑战,因为传统的NMT方法单独翻译句子,但性别信息并不总是在每个句子中明确说明。” 6月下旬,谷歌的四名研究人员发布了一个名为“维基百科翻译传记数据集”的数据集,其中包括一个人(男性或女性)、摇滚乐队或运动队(后两者被认为是无性别的)的维基百科条目集合。据谷歌称,新的数据集似乎能够显著改善性别访问,但仍有工作要做。 “值得一提的是,通过发布这一数据集,我们并不是为了规定什么是解决性别偏见的最佳方法,”该团队写道。“这项贡献旨在推动全球研究界在这一挑战方面取得进展。” 在这篇博文中,谷歌给出了一个西班牙语段落的例子,该段落的主题是女性,但该段落的每一句话中都没有明确提及该主题,因为西班牙语是一种支持删除的语言,并不总是在每一句话中都包含主题——因此,翻译引擎可能会使用男性代词而不是正确的女性代词(反之亦然)将此类句子误译成英语。使用维基百科翻译传记数据集时,谷歌翻译能够更频繁地使用准确的性别代词进行翻译。 早在四月份,在社交媒体用户注意到谷歌如何将非性别语言翻译成性别语言的问题后,MultiLingual就报道了谷歌翻译引擎中的性别偏见问题。通常,这种翻译反映了对性别角色的刻板描述——即,当与“医生”一词关联时,将芬兰语中的非性别代词翻译为英语中的“他”,但当与“老师”一词关联时,将同一代词翻译成“她”。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文