Apple Is Giving Developers a New Set of NLP Tools

苹果正在为开发人员提供一套新的NLP工具

2023-06-08 08:15 slator

本文共468个字,阅读需5分钟

阅读模式 切换至中文

As of Apple’s annual online conference, WWDC 2023, BERT is now making its way into the (Apple) developer mainstream. Bidirectional Encoder Representations from Transformers, in short, BERT, was open-sourced by Google for NLP pre-training in late 2018. Fast forward to Apple’s June 7, 2023, WWDC session, where the iPhone maker featured BERT as the key to creating new multilingual models in its Create ML app / framework. Create ML is a tool for training models for a variety of machine learning tasks in areas like image, sound, or activity but also tasks involving text such as text classification and word tagging. Apple reminded developers that transformer-based contextual embeddings are trained on large amounts of text using a masked model style of training, in which the model is prompted to suggest a missing word in a sentence. The multi-headed self-attention mechanism behind Transformers allows models to train on large amounts of textual data — including multilingual data. “It makes it possible to support many languages immediately and even multiple languages at once,” NLP Engineer Doug Davidson explained. “But even more than that, because of similarities between languages, there’s some synergy such that data for one language helps with others.” Using BERT embeddings and three separate models, one each for a group of languages with related writing systems, Davidson continued, Create ML can now support 27 different languages. One model supports 20 Latin-script languages; a second supports four languages written using the Cyrillic alphabet; and a third model supports Chinese, Japanese, and Korean. Davidson walked participants through the process of training a multilingual model in the Create ML app. Users create a new project, select training data, and, under the algorithm section, choose a new option, the BERT embeddings. They then select one of the three script-based models and set the language selection to “Automatic.” “The most time-consuming part of the training is applying these powerful embeddings to the text,” Davidson said. “Then the model trains fairly quickly to a high degree of accuracy.”Davidson demonstrated the process using a model that classified text messages in English, Spanish, German, and Italian as personal, business-related, or commercial. “As an example of the synergies that are possible, this model hasn’t been trained on French, but it can still classify some French text as well,” he pointed out, adding that the best practice is for developers to use training data for each language they plan to offer. According to a June 6, 2023 session on Create ML, the multilingual BERT embedding model can also boost the accuracy of monolingual text classifiers. Developers training models with PyTorch or TensorFlow, as opposed to Create ML, can also use the new BERT embeddings via NLContextual Embedding. In short — Apple is handing developers a range of new tools to embed NLP into their apps.
截至苹果年度在线会议WWDC 2023,BERT现在正在进入 (苹果) 开发者的主流。简而言之,Transformers的双向编码器表示形式是由Google开源的,用于2018年的NLP预培训。 快进到苹果的2023年6月7日WWDC会议上,iPhone制造商将BERT作为在其Create ML应用程序/框架中创建新的多语言模型的关键。Create ML是一种工具,用于训练图像,声音或活动等领域的各种机器学习任务的模型,还包括涉及文本 (例如文本分类和单词标记) 的任务。 Apple提醒开发人员,基于transformer的上下文嵌入是使用蒙版模型的训练风格在大量文本上进行训练的,其中模型被提示在句子中建议缺少单词。变形金刚背后的多头自我注意机制使模型可以在大量文本数据 (包括多语言数据) 上进行训练。 NLP工程师Doug Davidson解释说: “它使立即支持多种语言甚至同时支持多种语言成为可能。”“但更重要的是,由于语言之间的相似性,因此存在一些协同作用,因此一种语言的数据可以帮助其他语言。” 戴维森继续说,使用BERT嵌入和三个独立的模型,每个模型针对具有相关书写系统的一组语言,Create ML现在可以支持27种不同的语言。 一种模型支持20种拉丁脚本语言; 第二种支持使用西里尔字母编写的四种语言; 第三种模型支持中文,日语和韩语。 戴维森带领参与者完成了在Create ML应用程序中培训多语言模型的过程。用户创建一个新项目,选择训练数据,然后在算法部分下选择一个新选项,即BERT嵌入。然后,他们从三个基于脚本的模型中选择一个,并将语言选择设置为 “自动”。 戴维森说: “培训中最耗时的部分是将这些强大的嵌入应用到文本中。”“然后,该模型可以相当快地进行训练,达到很高的准确性。” 戴维森使用一种模型来演示该过程,该模型将英语,西班牙语,德语和意大利语的文本消息分类为个人,与业务相关或商业信息。 他指出: “作为可能的协同作用的一个例子,该模型尚未在法语上进行过培训,但它仍然可以对某些法语文本进行分类。” 他补充说,最佳实践是开发人员为他们计划提供的每种语言使用培训数据。 根据有关Create ML的2023年6月6日会话,多语言BERT嵌入模型还可以提高单语文本分类器的准确性。 与创建ML相反,使用PyTorch或TensorFlow培训模型的开发人员也可以通过NLContextual嵌入使用新的BERT嵌入。 简而言之,苹果正在向开发人员提供一系列新工具,将NLP嵌入到他们的应用程序中。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文