How to Avoid Data Bias in AI


2021-04-01 20:25 TAUS


阅读模式 切换至中文

We live in an increasingly digitalized world, where more and more of our day-to-day decisions are being made by algorithms in our cars, phones, computers, and TVs. AI is touching on almost all aspects of our lives, from smart self-learning home systems and assistive devices to simple shopping apps suggesting us what to buy based on our previously observed behavior. One could argue that people have been using algorithms - mathematical rules and calculations - in decision making for a long time, and they wouldn’t be wrong, it just never happened at the scale that AI makes it possible. With ever-growing datasets, vast computing power, and the ability to learn in an unsupervised manner, the amount of decisions that AI can make goes way beyond the abilities of any human being. Its potential impact on people and societies is, therefore, greater and the ethical concerns related to AI are more pressing than ever. Ethics and Bias in AI How can we define ethical AI systems? Here are some ethics principles to consider: ethical AI systems deliver outcomes that are in line with their intended purpose, but also with people’s intent, social and moral codes. They should benefit individuals, society and the environment and reduce the risk of negative outcomes. They should respect human rights, diversity, the autonomy and privacy of individuals and be reliable, inclusive and accessible. Finally, they should not involve or result in any kind of unfair discrimination against individuals or groups, creating or reinforcing inequalities. This sounds quite like what humanity has been trying to achieve for millennia, doesn’t it? Which brings us to the key question: if we as humans can display intentional or unintentional bias, how can we expect a system that is being programmed by us not to exhibit the same? Well, technically, we could. In AI, bias is considered an anomaly in the output of machine learning algorithms. It often happens when prejudiced assumptions are made in the process of developing algorithms and most often when training data contain bias. So, it is fair to assume that an AI system is as “good” as the quality of the data it’s being fed. If the training dataset is cleared of conscious and unconscious assumptions on ideological concepts such as race, gender and so on, we should be able to build an AI system that makes unbiased data-driven decisions. So, where do we start if we want to make input data bias-free? In one of the previous articles we covered the 9 most common types of data bias in Machine Learning. Specific actions can be taken to aid each one of them, but here we’ll look at more general ways of preventing data-related biased decision-making. 1. Understand Scope and Limitations of Your Model and Data Before gathering data, it is crucial to fully understand the scope of an AI experiment and its application. This includes understanding any societal or underlying scenarios which could potentially impact the outcome. A model trained to find correlations should not be used to make causal inferences. For example, a model can learn that users in India who prefer to browse the web in English over Hindi are also likely to buy in English, but that model should not necessarily infer that they don’t speak Hindi (or another Indic language) or that they wouldn’t buy in Hindi. It might just be that the current availability of content is greater in English. Additionally, a dataset that is not reflective of present-day norms and scenarios should not be used to train current applications. A job-matching algorithm trained with historical data can assume female pronouns when translating words like “nurse” or “babysitter” into Spanish, and only return matches for female applicants. 2. Ensure That Your Dataset is Representative and Diverse Proper data collection is perhaps one of the most impactful measures we can take to avoid bias in data. Data needs to be accurate and sampled in a way that represents all users and a real-life setting. Are you building a model that needs to serve users of all ages, but you only have training data from Millenials? Then you’ll want to collect additional data to also represent other age groups. Moreover, you should consider collecting data from multiple sources to ensure data diversity. In general, data coming from one source is assumed to be weaker than the data coming from multiple sources, and more likely to cause measurement bias. If you are training a chatbot, you’ll probably want to use publicly available data, data generated by professional translators, and also good amounts of user-generated data to cover as many ways as people may express themselves in a conversational setting. Check here the domain-specific Colloquial dataset TAUS created based on the sample provided by Oracle. 3. Employ Human Data Augmentation If you are using your legacy data, public training datasets, or even data acquired for a specific use case, you will often need to augment it to better reflect real-world frequencies of gender, race, events, and attributes that your model will be making predictions about. Getting representative and balanced training and test datasets is not an easy task, so you might want to consider services such as domain-specific data collection, data annotation or labeling for the best outcomes. TAUS has a community of over 3,000 data contributors and specializes in data services to help you collect and prepare text, image, or audio datasets that fit your project specifications. 4. Continuously Evaluate Results and Refresh Test Sets Any model that has been deployed should be thoroughly and continuously evaluated to assess the accuracy of its results. The results should be compared across different subgroups and stress tests should be used in cases where bias can potentially occur. As new features are being added and the systems are being further developed, you might want to consider using new or refreshed test sets that cover the new real-world scenarios. Technology Can Help Being aware of potential bias in data and actively taking preventative measures against it can help you build systems that can generate equally representative outputs for various scenarios. We can't expect technology to create ethical systems or make moral judgments by itself while even humans cannot collectively deliver an ethical judgment in many cases as seen in the Moral Machine experiment. But, just as AI technologies can potentially amplify poor decision-making, similar technologies can be applied to help identify and mitigate these risks. You can use Google's What-if Tool or IBM's open-source AI Fairness 360 toolkit or reach out to TAUS for custom data solutions tailored to your specific use case.
我们生活在一个数字化的世界,越来越多的日常决策是通过汽车,电话,计算机和电视中的算法来做出的。 人工智能正在涉及我们生活的几乎所有方面,从智能的自学式家用系统,辅助设备到简单的购物应用程序,这些内容都可以根据我们先前观察到的行为向我们建议购买什么。 有人可能会说,人们在决策过程中一直使用算法-数学规则和计算-很长时间了,他们不会错,在AI使之成为现实的规模上从来没有发生过。 随着数据集的不断增长,强大的计算能力以及无人监督的学习能力,人工智能可以做出的决策数量超出了任何人的能力。 因此,它对人类和社会的潜在影响越来越大,与人工智能相关的道德问题比以往任何时候都更加紧迫。 人工智能中的道德与偏见 我们如何定义道德的AI系统? 以下是一些需要考虑的道德原则:道德AI系统所产生的结果既符合其预期目标,也符合人们的意图,社会和道德准则。 它们应使个人,社会和环境受益,并减少负面结果的风险。 他们应尊重人权,多样性,个人的自治权和隐私,并应可靠,包容和可及。 最后,它们不应涉及或导致对个人或群体的任何形式的不公平歧视,从而造成或加剧不平等现象。 这听起来像是人类几千年来一直在努力实现的目标,不是吗? 这使我们想到了一个关键问题:如果我们作为人类可以表现出有意或无意的偏见,那么我们怎么能指望我们正在编程的系统不会表现出同样的偏见? 好吧,从技术上讲,我们可以。 在AI中,偏差被视为机器学习算法输出中的异常。 当在开发算法的过程中做出偏见的假设时,通常会发生这种情况;当训练数据包含偏见时,这种情况通常会发生。 因此,可以公平地假设一个AI系统与它所馈送的数据质量一样“好”。 如果训练数据集清除了关于意识形态概念(例如种族,性别等)的有意识和无意识假设,那么我们应该能够构建能够做出无偏见的,由数据驱动的决策的AI系统。 因此,如果要使输入数据无偏差,我们从哪里开始呢? 在上一篇文章中,我们介绍了机器学习中9种最常见的数据偏差类型。 可以采取具体行动来帮助每个人,但是在这里,我们将研究更通用的方法来防止与数据相关的有偏见的决策。 1.了解模型和数据的范围和局限性 在收集数据之前,至关重要的是充分了解AI实验的范围及其应用。 这包括了解任何可能影响结果的社会或潜在情况。 经过训练以寻找相关性的模型不应用于进行因果推断。 例如,一个模型可以了解到,印度用户更喜欢用印地语而不是印地语来浏览网络,但该用户也可能会用英语进行购买,但是该模型不必推断他们不会讲印地语(或另一种印度语言),或者 他们不会在印地语中购买。 可能只是目前英语的内容可用性更高。 此外,不应使用无法反映当前规范和场景的数据集来训练当前的应用程序。 经过历史数据训练的求职算法在将“护士”或“保姆”等词翻译成西班牙语时可以假设女性代词,并且仅返回女性应聘者的匹配词。 2.确保您的数据集具有代表性和多样性 正确的数据收集也许是我们可以采取的最有影响力的措施之一,可以避免数据出现偏差。 数据需要以代表所有用户和真实环境的方式进行准确和采样。 您是否正在建立一个需要为所有年龄段的用户提供服务的模型,但是您仅拥有Millenials的训练数据? 然后,您需要收集其他代表其他年龄段的数据。 此外,您应该考虑从多个来源收集数据以确保数据的多样性。 通常,假设来自一个来源的数据要弱于来自多个来源的数据,并且更有可能导致测量偏差。 如果您正在训练聊天机器人,则可能要使用公开可用的数据,由专业翻译人员生成的数据以及大量用户生成的数据,以覆盖人们在对话环境中表达自己的方式。 在此处检查基于Oracle提供的示例创建的特定于领域的口语数据集TAUS。 3.采用人力数据增强 如果您使用的是旧数据,公共培训数据集,甚至是针对特定用例获取的数据,则通常需要对其进行扩充,以更好地反映模型将要在现实世界中出现的性别,种族,事件和属性的频率。 做出预测。 要获得具有代表性的,均衡的培训和测试数据集不是一件容易的事,因此您可能需要考虑使用服务,例如特定领域的数据收集,数据注释或标记,以取得最佳效果。 TAUS拥有3,000多个数据提供者社区,专门从事数据服务,可帮助您收集和准备适合您的项目规格的文本,图像或音频数据集。 4.持续评估结果并刷新测试集 任何已部署的模型都应进行彻底,连续的评估,以评估其结果的准确性。 应比较不同亚组的结果,并在可能出现偏见的情况下使用压力测试。 随着新功能的添加以及系统的进一步开发,您可能要考虑使用涵盖新的实际场景的新的或更新的测试集。 科技可以帮助 意识到数据中的潜在偏差并积极地采取预防措施,可以帮助您构建可以为各种情况生成具有代表性的输出的系统。 我们不能指望技术能够创建道德体系或独自做出道德判断,而在道德机器实验中看到的许多情况下,甚至人类也无法集体做出道德判断。 但是,就像AI技术可能会放大不良的决策一样,类似的技术也可以用于帮助识别和缓解这些风险。 您可以使用Google的假设分析工具或IBM的开源AI Fairness 360工具包,也可以联系TAUS以获取针对您的特定用例量身定制的定制数据解决方案。