Intent Recognition in NLP

应用于自然语言处理中的意图识别功能

2021-09-07 20:00 TAUS

本文共1211个字,阅读需13分钟

阅读模式 切换至中文

As our society continues to rely on technologies such as social network apps, emails, chatboxes, and more, the volume and availability of text data continue to multiply. Due to the popular use of online or phone services, companies have previously had a difficult time keeping up. Intent recognition models have come to the rescue to help flag and sort through the vastness of this text data. Intent recognition, also commonly referred to as intent classification, uses machine learning and natural language processing to associate text data and expression to a given intent. In other words, intent recognition takes a given query as an input and associates it to the target class. For example, during a telephone prompter in an automated call, the model learns from the speech data what service a customer is looking for, based on key phrases, such as “pay my bill” or “speak to a representative.” Hence, intent recognition can be thought of as the process of classifying spoken or written text based on what the user wishes to achieve. Intent classification has been an important gamechanger to businesses, especially in regards to customer experience. Chatboxes, for example, are a popular platform that uses intent recognition for sales conversations, customer support, and more. Automating customer service needs through intent classification allows businesses to scale and meet customer demands faster. Just as other machine learning models, intent classification requires typical steps such as data acquisition and preparation. The input data can be in the form of text or speech (e.g. audio files), where speech data will need to be converted into text in order to create a training dataset. Businesses often use their own text data found in log files to use as training data. Other options include crowdsourcing, outsourcing, or generating synthetic data. Because intent classification is a supervised machine learning problem, the training data needs to be labeled. These labels are an important aspect of the training set, as they are the intents defined by the business. For example, common intents for customer service-based business models include “purchase”, “account closure”, “pay bill”, etc. Once your intents are well defined according to their context, each text example will need to be labeled correspondingly. Labeling these datasets can be a time-consuming effort. Many organizations opt to either label their datasets in-house or use third-party vendors. With the growing business of labeling services, such as the TAUS HLP, a platform accommodating a large variety of audio/image/text-based data collection and labeling or annotation tasks through a qualified global network, tailored labels can be generated. There exist pre-trained models that are open-source and available for use, namely the (Bidirectional Encoder Representations from Transformers) BERT model. This language model is a pre-trained transformer encoder, training from sources such as Wikipedia and other book corpus datasets. If you choose to use an existing pre-trained model or create your own, integrating contextual word embedding is important to yield stronger predictions. Word embeddings are vector representations of text data where words with similar contextual meaning have a similar representation. Words from text data are represented as real-valued vectors in a predefined vector space. Hence, in a coordinate system, words that are related are in close proximity to one another based on a corpus of relationships. These word embeddings can be learned during the training step. Word2vec is a popular and powerful statistical method for creating word embeddings, which can be obtained by using 2 methods called Skip Gram and Common Bag of Words (CBOW), both using a neural network approach. Once the dataset is processed and labeled, the model is ready to be trained. After model training, it is a good idea to test against both a test and validation set. This process will test the trained model on a set of unlabeled data to see how well the model performs. The performance on the validation step is a good indication if the model needs further adjusting or perhaps more quality data. Once a model has been validated, it is ready to provide intent recommendations. According to the International Conference on Internet Science, there are a number of factors as to why users trust chatboxes, including the quality of interpretation of requests, human-likeness, and self-presentation. Chatboxes compartmentalize and facilitate a user’s main goal through a conversation or dialogue. Intent recognition determines whether or not chatboxes will help fulfill and satisfy customer service goals, sales goals, and marketing goals. The quality of the chatbox is deemed directly from the quality of the training data. This is key to maintaining a useful and pleasurable experience to the end-user. Hence, the overall effectiveness of the chatbox is determined by the ability to understand the correct intent and detect the appropriate response. Chatboxes identify user intent through a series of steps, starting with data collection. Once the chatbox has sufficient data, processing that data is an important step to help the chatbox structure an effective response. This includes syntax analysis and semantic analysis which both help to grammatically structure the text and infer meaning by distinguishing context. Next, classifiers perform intent classification by training on appropriately labeled datasets. Finally, chatboxes formulate responses through these predictions through dialogue formulation. Sales teams are often swamped with manually analyzing a large volume of emails or calls. Because prospects can lose interest rather quickly if not met with a timely response, understanding a customer’s intent is vital. Intent recognition helps to surface and prioritize sales leads that have clear purchasing intent during both inbound and outbound sales processes. For example, emails from prospective clients can be tagged with highly interested, needs support, unsatisfied, complaint, curious, etc. Knowing which email to prioritize can therefore make a great impact on a business’s ability to scale and acquire more customers, thereby increasing revenue. Thanks to the availability of historical records and documentation, businesses often have a pre-existing abundance of user data at their disposal. This type of data can be collected in the form of user interaction with a website, phone recordings, or other logs. This data provides valuable input for an intent classifier, for a company to attempt to introduce platforms to automate customer support and response times. For instance, companies can set up a telephone bot when a customer dials a customer service line. To avoid human effort in answering commonly asked questions, the bot can identify user intent and direct them to an appropriate channel. While they deal with huge strides in customer support, intent recognition helps businesses to scale and quickly satisfy customer needs. Takeaway Intents are at the forefront of any conversational interface. Intent classification helps bridge the gap between user interactions in a given software platform and their intentions. As businesses scale and reach a massive amount of people, it is essential that they are able to meet user needs in an automated and efficient manner. Because of the complex nature of text data and user interaction with an interface, intent recognition algorithms are continuously evolving and improving. TAUS HLP Platform with its highly competent global talent communities formed based on project requirements is a great solution for any intent classification tasks. Contact us to design custom solutions for your projects.
我们的社会对社交网络应用软件、电子邮件、聊天盒等技术有持续的依赖性,这导致文本数据的数量和利用性持续增长。在过去,企业很难提升信息处理的速度。而如今线上服务或通话服务的广泛使用,意图识别模型已经开始应用于帮助标注和分类海量的文本数据。 意图识别,通常也被称为意图分类,是利用机器学习和自然语言处理将文本数据和表达式与给定的意图联系起来。换句话说,即意图识别将给定查询指令导入并关联到目标类中。例如,在自动呼叫电话的提示器中,该模型根据客户语音数据中的关键短语判断其需要的服务,如“付账”或“与代表通话”。因此,意图识别可以看作是一种根据使用者所希望实现的意图对目标语言或书面文本进行分类的过程。 对企业来说,意图分类带来了重要的改变,尤其是在用户体验方面。例如,聊天盒是一个主流的平台,它使用意图识别来进行销售对话、用户支持等。应用了自动化的意图分类技术用户服务程序,能够使企业更快地扩大商业规模并满足客户需求。 与其他机器习得模式相同,意图分类的标准学习流程包括了数据的获取和准备等。输入的数据可以储存在文本或语音(例如音频文件)的申请表中,其中语音数据则需要转换成文本以便创建训练数据集。企业经常使用在日志文件中找到的自己的文本数据作为训练数据,还可以众包、外包和生成综合数据。 由于意图分类是一个监管式的机器习得程序,需要对训练数据进行标注。根据企业预先设定好的的意图,这些标注数据是训练集的一个重要组成部分。例如,用户服务商业模式下的常见意图包括“购买”、“关闭帐户”、“支付账单”等。一旦根据语境准确地定义了意图则需要相应地标记每个文本示例。对这些数据集进行标注是一项耗时的工作。许多机构选择在企业内部自行标记数据集或者外包给第三方供应商。随着市场上标注业务需求的不断增长,例如TAUS HLP是能够进行大量音频/图像/文本的数据收集和标注任务的平台,它基于高质量的全球网络,可以为企业生成量身定制的标注结果 。 它拥有开源的、可使用的预训练模型,即(来自变形金刚的双向编码器表示)BERT模型。这个语言模型是一个预先训练的转换器编码器,训练资源来自维基百科和其他图书语料库数据集。 不论您使用已有的预训练模型或是自己创建的模型,想要产生更精确的意图预测都很大程度上取决于整合语境词的嵌入。词语嵌入是文本数据的矢量表示,其中拥有语境相似的词语具有类似的指代含义。来自文本数据的词语在预定义向量空间中指代实值向量。因此,在坐标系中,根据关系语料库的数据,相关词语之间非常接近。这些词语的嵌入能够在训练步骤中习得。Word2vec是一种常用的、功能强大的统计方法,用来创建镶嵌词语。它使用连续跳跃元语法和常用词汇数据包(CBOW)两种途径,这两种途径都运用了神经网络技术。 只要数据集被加工和标记,模型就准备好训练了。在经过模型训练之后,再针对测试集和验证集进行测试。这个过程是在受训过模型上测试一组未进行标记的数据上,观察模型的执行情况。验证流程的表现是一个很好的反馈,是否模型需要进一步调整或需要更多高质量的数据。一旦模型通过了验证,它就可以提供意图建议。 根据互联网科学国际会议,使用者选择相信聊天盒平台的原因有很多,包括需求的解决质量、人的相似性和自我展示面。聊天盒通过交谈或对话来区分和完成用户的主要意图。意图识别决定了聊天盒是否能够帮助实现和满足客户服务需求、销售目标和营销目标。聊天盒的使用感直接由训练数据的质量决定,关键在于提供给用户良好舒适的使用感。因此,聊天盒的总体上的使用效果由模型对意图的正确理解和察觉适当反应的能力决定。 聊天盒通过一系列步骤来识别用户意图,首先从数据收集开始着手。聊天盒需要有足够的数据,加工这些数据是帮助聊天盒建立有效响应的一种重要步骤。这包括了句法分析和语义分析两个部分,它们都有助于从语法角度构造语篇,并通过区分语境来推断其意义。接下来,分类器通过对适当标记的数据集进行训练来执行意图分类。最后,聊天盒通过对话的方式为这些预测设定回应程序。 销售团队经常要人工处理大量电子邮件或电话。因为如果潜在客户没有得到及时的答复,他们就有可能很快失去兴趣,所以理解客户的意图至关重要的。意图识别有助于销售过程中针对有明确购买意图的线索进行浅层分析和优先排序。例如,来自潜在客户的电子邮件可以被标记为高度感兴趣、需要支持、不满意、抱怨、好奇等。因此,知道优先处理哪封电子邮件可帮助企业扩大规模并获取更多客户,从而增加收入。 由于历史记录和文档的可用性,企业通常拥有丰富的预先储存的用户数据供其使用。这种类型的数据可以来源于用户交流和网站、电话记录或其他日志的形式。这些数据为意图分类器提供了有价值的输入这可以使公司尝试引入平台来实现自动化客户支持和节约响应时间。例如,对于客户拨打客户服务热线这一方面,公司可以设置一个电话机器人。针对需要人工回答常见的问题,bot可以识别用户的意图并将他们引导到适当的渠道。意图识别不但在客户支持方面取得了和的巨大进步,而且有助于企业扩大规模并快速满足客户需求。 要点 意图是任何对话界面之中的重点。意图分类有助于缩短用户交流在给定软件平台中与其意图之间的鸿沟。随着企业规模的扩大和用户的大范围覆盖,重要的是他们能够以以一种自动化且高效的方式来满足用户的需求。由于文本数据和用户交互界面的复杂性,意图识别算法还在在不断发展和完善。 TAUS HLP平台以及其高水平的全球人才社区,基于项目需求可以是任何意图分类任务的良好解决方案。欢迎联系我们为您的项目定制解决方案。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文