How Personal Can Language “P13n” Get?

从L10n到P13n,语言的“个性化”到底是啥?

2020-08-04 00:40 TAUS

本文共1683个字,阅读需17分钟

阅读模式 切换至中文

In this data age, we can personalize the provision of information because we are rapidly learning more stuff about hundreds of millions of individual users of services. The underlying idea is that an AI-enabled information system (internet, website, media, etc) will be able to give end-users exactly the content they want. Why? Because these users offer increasingly rich data profiles to the machines running online services. How far, then, will a person’s experience of language(s) - one crucial variable - offer a challenge or a solution to tomorrow’s personalized data game? There are two directions we can take: We can look at language’s resources for personalization (p13n) and see how these play out in a datafied ecology & economy, Or we can look at how individuals involved in transactions can be addressed perso-linguistically by the market machine. Take this baseline situation: A virtual assistant (VA) can listen to my voice, detect signs of depression (or even covid-19!) emitted by my vocal cords, transmit an alert to a treatment service which will make a diagnosis based on access to my entire online medi-history, run against a zillion other sufferers and analyzed using machine learning. Personalized help will not be far away. Note that in this case, the system doesn’t even need to know which language I speak, as the crucial data points are detected in the sounds of my voice rather than the meaning of my words. Quite a broad range of specialized psychological, medical and sociological insight can likely be derived from listening solely to speakers as ‘noise makers’ rather than as ‘sense makers.’ A written message, however, bearing identifiable semantic value would almost certainly not trigger a depression alert as such (it could be a fake, couldn’t it?). Yet generalized rant and extremist content written to social media can provide “psychological” alerts (due to specific words in a given language) for content moderators. From L10n to P13N So far in the translation industry, p13n has been used to refer to the action of adapting content for traditional (big) language communities. The general view found in progress reports is that the range of languages to be datafied and digitized as online tongues is rising, albeit slowly. And that the unsurprising rationale for this is that most speakers of a language prefer to do business, search for information, or enjoy entertainment in their own tongue. So localizing a website from English into Kiswahili is rather condescendingly understood today as in some way ‘personalizing’ content for some 130 million potential users. Just as the Wycliffe Global Alliance has been steadily ‘personalizing’ parts of the Christian Bible into 3,384 languages. The trouble with this interpretation is that once you’ve got a new language onboard, your p13n work never stops. You will have to start addressing all those new language speakers with their different personal preferences as evidenced in their usage (dialectal, social, racial, religious, educational, etc.), which is what we ultimately mean by p13n. What they prefer to read/hear and what they prefer to say/write Let’s look at what this could mean. AI-driven micro-analytics is set to reveal more about the power of language to influence specific individuals than just large cohorts of readers/listeners in general. Consider the current retail trend of giving customers more background information about product sourcing in order to build greater trust and attract various new sub-groups. What if the impact of this activation depended on personalized language preferences about the very nouns, verbs, adjectives, and rhetorical constructions that go into marketing content, once first contact is made through an advert? And what if financial, insurance and similar services attempted to radically personalize the style of their communications to build stronger personal relationships by adapting parts of their content to near-individual psychological preferences when talking about money or debt? Turn on the Speakers This fine-tuned p13n is surely on the horizon. The global commercial translation industry is worth only a few billion dollars more than Amazon’s annual R&D budget of $35.9Bn. We can be almost sure that some of this R&D spend will go into working out the next stage in deploying voice assistance/assistants (VA). Spoken language is a remarkable conveyor of sentiment and an influencer on personal decision-making. VA developers are well aware of the volume, depth and granularity of information that voice input and output can provide marketing services of all kinds. So if distinguishing between English and Kiswahili voices is deemed a major l10n achievement today (and translating content from one to the other is a real advance), the next steps in personalized VA language will involve far subtler dimensions. These could include understanding and delivering on the rhetorical benefits of different vocab choices, linguistic registers, topicality of content, voice musicality, and the use of humor or play. Brands and services will have to develop content that both echoes the natural discourse of potential end-users, while leading those same consumers to embrace the specific design language of the brand. A complex equation. Only big data will be able to tell us if and how these semiotic features of human vocal and textual experience will influence language use. We will then be able to use smaller sets of personal data to build those end-user p13n profiles. A simple contrast helps understand these possibilities in the voice domain. In standard conference interpreting situations, we don’t expect a match between the source speaker and the translator’s gender or age or voice quality - a 25-year-old woman can interpret (“speak for”) a 70 year-old man with a dialect-based accent. Someday soon (shock and horror!) we will have a conference interpretation system that will model the man’s voice in near real-time and automatically generate a translation from the system using his personal voice quality and speaking style for his translated contributions. This could seem ‘uncanny’, as they say, or it could become another new normal. Making a user (customer, etc.) feel more at home with content, therefore, by testing different voices, rhythms, and speeds for delivering the content in question will be the stuff of machine learning worldwide, and probably form a major thread of future language p13n. This will mean that anyone in charge of multilingual versions of VAs and other robots (it may not even involve traditional translation as such) will need to go much deeper into the data than simply localizing “content as phrases” from language A to B. Their job will be to ensure a linguistic match between a p13n profile and a given message. And in due course, all this will have to be automated to handle communication in signing versions as well as written and spoken language. The language trick that enables this familiar sort of p13n is paraphrase - two or more semantically equivalent versions of a single meaningful utterance that can be used in differently marked social and behavioral situations (Get the hell out! vs. Please leave immediately!). Translation is therefore a form of cross-lingual paraphrase. Yet inside some commercial translation, we shall presumably need to provide different paraphrases of the same basic utterance for different age or racial groups, education cohorts, and game players, in order to address humorous or serious situations, and all the rest of the variables from privacy to big crowds. Personalizing means customizing, as transcreators know well. hbspt.cta._relativeUrls=true;hbspt.cta.load(2734675, 'd027041a-3b83-43c6-850b-be9e421b10bd', {}); Towards Design Thinking for Language With an increasing production of spoken content & messaging, we will need to be able to mix and match these different custom registers in subtle ways. For example, the whole logic of emoji and visual languaging currently rampant on social media is tending towards capturing more of the allusive, freewheeling, short-hand power of language on the wing, and making it zing. However, translating at scale some of this playful yet meaningful content into other languages - our first step in p13n - is hardly on the agenda yet for machines, as they don’t yet have access to enough variegated language data. Trying to do this at human speed would usually be counter-productive. P13n will also require us to translate content for different communities of inclusiveness, either for legal and political reasons or because one language’s innate structure requires subtler gender management or racial coding than another’s. This in turn will require access to greater knowledge about specific language behaviors than we might typically expect from our daily production teams. At yet another level of communication, people instinctively prefer specific qualities of voices (think of your favorite actors and singers), and might therefore appreciate some voice content to address them in their preferred (although possibly faked) sonorities - especially if we can choose between spoken and written forms of the same content depending on our media choice at a specific moment. Different types of content for different cohorts might therefore trigger different voice choices, rhythms, speeds, and syntaxes in a commercial context. All these variables could then become issues of linguistic choice among translation target users as well, massively complicating the delivery of translated content, and opening it up to automated solutions. These will first need to machine-test end-user evaluations of different versions on a vast scale to see what works. Linguists will then be able to work on succeeding fashions in language artistry to build p13n libraries. This all suggests that the industry will need to move beyond the traditional choice of “national” or “regional” tongues as the defining criterion of p13n, and espouse the general tenets of “design thinking” and “millennial” mindsets when it comes to communication and digital continent. Planning how to personalize effectively, collect the data that determine personalized value in language both read/heard or written/spoken, and enrich the capacity of transcreation, machine translation, and other techniques to address this highly competitive challenge will no doubt fill many post-covid sleepless nights.
在这个数据时代,我们可以个性化提供信息,因为我们正迅速了解更多关于数以亿计的个人服务用户的信息。 个性化的基本理念是,一个人工智能支持的信息系统(互联网、网站、媒体等)能给最终用户提供他们想要的内容。这是为什么呢? 因为这些用户向运行在线服务的机器提供了日益丰富的数据配置文件。那么,一个人对语言的体验这个至关重要的变量会在多大程度上为将来的个性化数据游戏提供挑战或解决方案呢? 我们可以采取以下两种办法: 我们可以看看个性化(p13n)的语言资源,看看这些资源在数据化的生态和经济中是如何发挥作用的, 或者,我们也可以看看市场机制如何用不同的语言来称呼参与交易的个人。 以这种基本情况为例:一个虚拟助理(VA)可以听我的声音,检测我声带发出的抑郁迹象(甚至是新冠肺炎!),向治疗服务机构发送警报。该机构将根据我的全部在线医疗记录,与无数其他患者进行对比,并使用机器学习进行分析,从而做出诊断。个性化帮助将很快到来。 注意,在这种情况下,系统甚至不需要知道我说的是哪种语言,因为关键的数据点是从我的声音中检测出来的,而不是从我所说的话中检测出来的。仅仅把说话者当成“噪音制造者”而不是“感觉制造者”来倾听,可能会产生相当广泛的专业心理学、医学和社会学方面的专业见解。然而,带有可识别语义价值的书面信息几乎肯定不会引发抑郁症警报(它可能是假的,不是吗?)。 然而,写到社交媒体上的广泛性的咆哮和极端主义内容(由于特定语言中的特定词汇)可以为内容版主提供“心理”警报。 从L10n到P13n 迄今为止在翻译行业,个性化一直被用来指代为传统(大)语言社区改编内容的行动。 进度报告中显示的普遍看法是,作为在线语言进行数据化和数字化的语言范围正在增加,尽管速度缓慢。 这一点不足为奇,因为大多数讲一种语言的人更喜欢用他们自己的语言做生意、搜索信息或享受娱乐。因此,将一个网站从英语本地化到斯瓦希里语,在某种程度上就是对大约1.3亿潜在用户的“个性化”内容。正如威克理夫国际联会一直在稳步地将基督教圣经的部分内容“个性化”成3384种语言。 这种解释的问题在于,一旦你学习了一种新语言,你的个性化工作就永远不会停止。你将不得不开始用他们的习惯用法(方言、社会、种族、宗教、教育等)所展示出的不同的个人喜好来称呼所有那些新的语言使用者,这就是我们最终所说的个性化的意思。 他们更喜欢读/听什么,更喜欢说/写什么,让我们来看看这可能意味着什么。 人工智能驱动的微观分析将更多地揭示语言对特定个体的影响,而不仅仅是对广大读者/听众的影响。考虑目前的零售趋势,即给予顾客更多关于产品来源的背景信息,以便建立更大的信任和吸引各种新的子群体。 如果这种激活的影响取决于个性化的语言偏好,一旦通过广告第一次接触到营销内容中的名词、动词、形容词和修辞结构,那会怎样呢? 如果金融、保险和类似的服务试图彻底个性化他们的沟通方式,在谈论金钱或债务时通过调整他们的部分内容以接近个人的心理偏好,从而建立更牢固的个人关系,那又会怎样呢? 语音输出 这种微调的个性化肯定即将到来。 全球商业翻译行业的价值仅比亚马逊每年359亿美元的研发预算多出几十亿美元。我们几乎可以肯定的是,这些研发支出的一部分将用于制定部署语音辅助/助理(VA)的下一阶段。口语是情感的重要载体,也是个人决策的重要影响因素。VA开发人员非常清楚语音输入输出能够提供各类营销服务的信息量、深度和粒度。 因此,如果今天区分英语和斯瓦希里语语音被认为是本地化的一项重大成就,将内容从一种语言翻译成另一种语言则是一项真正的进步,那么个性化VA语言的下一步将涉及更微妙的层面。下一步可能包括理解和传达不同的词汇选择的修辞好处、语言域、内容的时事性、声音的音乐性、以及幽默或玩笑的使用。 品牌和服务必须开发出既能呼应潜在最终用户自然话语的内容,又能引导这些消费者接受品牌的特定设计语言。 这是一个复杂的方程式。 只有大数据才能告诉我们,人类声音和文本经验的符号学特征是否会以及如何影响语言的使用。然后,我们才能使用较小的个人数据集来建立那些最终用户的个性化配置文件。 简单的对比有助于理解语音领域中的这些可能性。 在标准的会议口译场合,我们并不期望源语演讲者和译员的性别、年龄或音质相匹配,一个25岁的女性也可以为一个70岁的带有方言口音的男性口译(“代表讲话”)。 不久的某一天(震惊和恐怖!) 我们将有一个会议口译系统,将模拟该男子的声音在近实时和自动生成翻译系统使用他的个人声音质量和讲话风格为他的翻译贡献。 正如他们所说,这可能看起来“不可思议”,或许也可能成为另一种新常态。 因此,通过测试不同的声音、节奏和传递内容的速度,让用户(客户等)对内容更有宾至如归的感觉,这将是世界范围内机器学习的内容,并可能形成未来语言个性化的一个主要趋势。 这将意味着任何负责VA和其他机器人的多语言版本的人(甚至可能不涉及传统翻译本身)都需要深入数据,而不是简单地将“内容作为短语”从语言A本地化到语言B。他们的工作将是确保个性化配置文件和给定消息之间的语言匹配。在适当的时候,所有这些都必须自动化,以处理签名版本以及书面和口头语言的通信。 实现这种熟悉的个性化的语言技巧是意译。一个有意义的话语有两个或两个以上语义对等的版本,这些版本可以用于不同标记的社会和行为场合(滚出去!vs. 请马上离开!)。因此,翻译是一种跨语言的意译。然而,在一些商业翻译中,我们大概需要为不同的年龄或种族群体、教育群体和游戏玩家提供相同基本话语的不同释义,以便处理幽默或严肃的情况,以及从隐私到大众的所有其他变数。个性化意味着定制,就像译创者所熟知的那样。 hbspt.cta._relativeURLS=true;hbspt.cta.load(2734675,'D027041A-3B83-43C6-850B-BE9E421B10BD',{}); 面向语言的设计思维 随着口语内容和消息传递的生产不断增加,我们需要能够以微妙的方式混合和匹配这些不同的自定义注册表。 例如,目前在社交媒体上流行的表情符号和视觉语言的整体逻辑正趋向于捕捉更多的含暗示的、随心所欲、简化的语言力量,并使之成为时尚。 然而,将这些有趣而有意义的内容大规模地翻译成其他语言是我们个性化的第一步,对于机器来说还很难提上议事日程,因为它们还无法访问足够多的五花八门的语言数据。试图以人类的速度这样做通常会适得其反。 个性化还将要求我们为不同的包容性社区翻译内容,这要么是出于法律和政治原因,要么是因为一种语言的固有结构比另一种语言需要更微妙的性别管理或种族编码。这反过来又需要获得更多关于特定语言行为的知识,这比我们通常从日常生产团队中所期望的要多。 在交流的另一个层面上,人们本能地喜欢声音的特定品质(想想你最喜欢的演员和歌手),因此可能会欣赏一些声音内容,以他们喜欢的(尽管可能是假的)声音来表达他们,特别是如果我们可以根据特定阶段的媒体选择,在相同内容的口头和书面形式之间做出选择。 因此,在商业环境中,针对不同群体的不同类型的内容可能会触发不同的语音选择、节奏、速度和语法。 然后,所有这些变量都可能成为翻译目标用户之间的语言选择问题,使翻译内容的交付变得非常复杂,并向自动化解决方案开放。 他们首先需要对不同版本的最终用户评估进行大规模的机器测试,看看哪些是有效的。语言学家们才能够研究语言艺术的后继时尚,以建立个性化库。 所有这些都表明,行业需要超越传统的选择“国家”或“地区”语言作为个性化的定义标准,并在涉及到通信和数字大陆时,拥护“设计思维”和“千禧年”思维的一般原则。 规划如何有效地个性化,收集决定语言(读/听或写/说)个性化价值的数据,以及丰富翻译、机器翻译和其他技术的能力,以应对这一高度竞争的挑战,无疑将填补许多新冠肺炎后的不眠之夜。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文