AI and localization: beyond the hype

AI和本地化:超越炒作

2023-06-03 11:00 multilingual

本文共978个字,阅读需10分钟

阅读模式 切换至中文

Artificial intelligence has quickly shown itself as a disruptor of many industries. In localization specifically, AI has the potential to assist many localization processes by harnessing technology and empowering new workflows. It seems that every few months we are asking how AI will change the world, how customer needs are driving its evolution, and how we are tackling the challenges that come along with such a rapidly evolving technology. AI will inevitably play a greater role in supporting content localization in our industry, but perhaps not in the ways we anticipate. Who owns my voice? Localized video content, particularly through the art of voice-over dubbing, has the potential to strike a connection, trigger emotional responses, and bring characters to life through authentic performances – all having a compelling effect on storytelling. Speech-to-text generators have been used for several years in the subtitling marketplace and we’re increasingly seeing AI-enabled text-to-speech voice generators and adjacent tools, guaranteeing studio quality and natural-sounding voice-overs. Microsoft’s text-to-speech AI model, VALL-E, claims to simulate a person’s voice with only three seconds of an audio sample. The creators speculate that AI speech editing will be able to change a person’s voice recording by simply altering the text transcript i.e., making them say something they didn’t originally say. YouTubers are producing plenty of how-to videos on creating high-quality AI voice-overs using ChatGPT, showcasing how quickly AI is evolving traditional VO processes – all with the goal to produce human-sounding speech (i.e. not seemingly computer-generated). There are many unanswered questions about AI voice-overs like: who owns those voices? What are the copyright protections and regulations when AI changes what a voice artist originally recorded? What if an artist’s voice is altered in an AI workflow and repurposed? How does one even track these instances with authentication getting increasingly difficult? What if AI work is created that resembles the voice of an existing artist or piece of content yet created in a different language? Even though an artist may consent to AI work, who owns the copyright of their prior work and performances? With the ability of artificial intelligence to alter voices (and images), who owns the rights to do so? What are the potential risks around the AI localization business models? These are murky waters indeed, and with the speed at which AI is making its claim, these are all important ethical and legal considerations to keep in front of mind. The synergy of AI and humans As studios and content producers continue to reach new audiences by localization (either by dubbing or subtitling), content owners are looking for tools that can expedite their timelines to get to platform quicker. With the current industry trends, they’re also looking to do it more cost effectively. As content owners work to capitalize on the opportunities to sell their titles and catalogs in new territories, more automated video dubbing service providers are entering the market offering lower price tags and promising to reduce bottlenecks. In some cases, claiming to offer a tool that will revolutionize the industry and eliminate the need for human intervention. And yes, it’s true that AI and machine learning technologies can save money and may get content more quickly to streaming video-on demand services, but at what cost and in what quality? The quality of machine translation outputs has improved in recent years, and there have been enhancements in speech recognition, speech synthesis, and natural language processing technology within the localization industry but there is still a need for human involvement to compensate for AI deficiencies. The dubbing process relies heavily on context, emotions, colloquial language, and subtle situational and cultural nuances that are needed for high-quality localization. These complexities and critical levels of quality cannot be derived from AI and machine learning – not yet anyway. Most believe that AI will continue to develop, but there is also general agreement that having humans involved in a process where quality is so important will continue for quite some time. Simply put, maintaining the required level of quality and consistency in high-end localization is not possible without extensive human intervention. Thus, I see AI in its current state as an assistive tool, albeit one that is very quickly evolving. Without a doubt, however, we know AI workflows are going to be transformative, while still leaving enough room for human creativity. AI use cases It’s not to say that AI tools aren’t a good fit for localizing certain types of content. There are a wide range of cases, such as promotional videos, explainer/training videos, e-learning content, podcasts, some social media, and content that may require a lesser degree of artistry and performance. For example, here at Visual Data, we are experiencing an increase in requests for simulated voices in our localization services. Especially for types of content that don’t have a big quality requirement or multi-lingual content for international release with low budgets. When it comes to the delivery of AI-adapted content, it’s about meeting audience expectations and fulfilling the perception of the individual’s trademark or a company’s brand. Delivering perceived lower-quality content may hurt a company’s brand in the long run, increase churn rates, or simply cause the audience to turn off. I also see a strong case for AI tools to help us analyze data, track consumer behavior, and compute how localization may impact viewer retention and monetization. AI technology is a tool; it cannot solve all our challenges, and it cannot replace the human element which remains at the core of our work. It is certainly fascinating, however, to experience all these technological advances and developments, and be a part of this rapid and pivotal revolution.
人工智能已经迅速显示出自己是许多行业的颠覆者。在本地化方面,人工智能有可能通过利用技术和授权新的工作流程来帮助许多本地化流程。似乎每隔几个月,我们就会问人工智能将如何改变世界,客户需求如何推动其发展,以及我们如何应对这种快速发展的技术带来的挑战。 人工智能将不可避免地在支持我们行业的内容本地化方面发挥更大的作用,但可能不会以我们预期的方式。 谁拥有我的声音? 本地化的视频内容,特别是通过配音艺术,有可能通过真实的表演建立联系,引发情感反应,并使角色栩栩如生-所有这些都对讲故事产生了引人注目的影响。语音到文本生成器已经在字幕市场上使用了好几年,我们越来越多地看到支持AI的文本到语音语音生成器和相关工具,以保证工作室质量和自然的配音。微软的文本到语音AI模型VALL-E声称只需三秒钟的音频样本就可以模拟一个人的声音。创作者推测,人工智能语音编辑将能够通过简单地改变文本转录来改变一个人的语音记录,即,让他们说出了一些原本没说的话YouTubers正在制作大量关于使用ChatGPT创建高质量AI配音的操作视频,展示AI如何快速发展传统的VO过程-所有这些都是为了制作听起来像人的语音(即语音)。看起来不是计算机生成的)。 关于AI画外音有许多未解答的问题,例如:谁拥有这些声音?当人工智能改变配音演员最初录制的内容时,版权保护和法规是什么?如果艺术家的声音在人工智能工作流程中被改变并重新使用会怎么样?在身份验证变得越来越困难的情况下,如何跟踪这些实例?如果人工智能作品类似于现有艺术家的声音,或者是用不同语言创作的内容,那会怎么样?即使艺术家可以同意人工智能的作品,谁拥有他们以前的作品和表演的版权?随着人工智能改变声音(和图像)的能力,谁拥有这样做的权利?AI本地化商业模式的潜在风险是什么?这些确实是浑浊的水域,随着人工智能提出要求的速度,这些都是需要考虑的重要道德和法律因素。 AI与人类的协同 随着工作室和内容制作人继续通过本地化(配音或字幕)吸引新的受众,内容所有者正在寻找可以加快他们的时间表以更快地进入平台的工具。随着当前的行业趋势,他们也希望更经济有效地做到这一点。随着内容所有者努力利用机会在新的地区销售他们的标题和目录,更多的自动视频配音服务提供商正在进入市场,提供更低的价格标签,并承诺减少瓶颈。在某些情况下,声称提供了一种工具,将彻底改变行业,消除人为干预的需要。是的,人工智能和机器学习技术确实可以节省资金,并可以更快地将内容传输到流媒体视频点播服务,但成本和质量如何?近年来,机器翻译输出的质量有所提高,本地化行业的语音识别、语音合成和自然语言处理技术也有所增强,但仍需要人工参与来弥补人工智能的不足。 配音过程在很大程度上依赖于高质量本地化所需的上下文、情感、口语以及微妙的情景和文化细微差别。这些复杂性和关键的质量水平无法从人工智能和机器学习中获得--至少现在还不能。大多数人认为人工智能将继续发展,但人们也普遍认为,让人类参与质量如此重要的过程将持续相当长的一段时间。简而言之,如果没有大量的人工干预,就不可能在高端本地化中保持所需的质量水平和一致性。因此,我认为人工智能目前的状态是一种辅助工具,尽管它正在迅速发展。然而,毫无疑问,我们知道人工智能工作流程将是变革性的,同时仍为人类创造力留下足够的空间。 AI用例 这并不是说AI工具不适合本地化某些类型的内容。有各种各样的情况,如宣传视频,讲解员/培训视频,电子学习内容,播客,一些社交媒体,以及可能需要较少艺术性和性能的内容。例如,在Visual Data,我们的本地化服务中对模拟语音的请求正在增加。特别是对于没有很大质量要求的内容类型或低预算的国际发布的多语言内容。 当涉及到AI适应内容的交付时,它是关于满足观众的期望并实现个人商标或公司品牌的感知。从长远来看,提供被认为质量较低的内容可能会损害公司的品牌,增加流失率,或者仅仅导致观众关闭。 我还看到了人工智能工具帮助我们分析数据,跟踪消费者行为,并计算本地化如何影响观众保留和货币化的有力案例。 AI技术是工具;它不能解决我们的所有挑战,也不能取代仍然是我们工作核心的人的因素。然而,体验所有这些技术进步和发展,并成为这场快速而关键的革命的一部分,这无疑是令人着迷的。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文