Types of Training Data

训练数据类型

2021-10-04 21:25 TAUS

本文共260个字,阅读需3分钟

阅读模式 切换至中文

Training data is used in three primary types of machine learning: supervised, unsupervised, and semi-supervised learning. In supervised learning, the training data must be labeled. This allows the model to learn a mapping from the label to its associated features. In unsupervised learning, labels are not required in the training set. Unsupervised machine learning models look for underlying structures in the features of the training set to make generalized groupings or predictions. A semi-supervised training dataset will have a mix of both unlabeled and labeled features, used in semi-supervised learning problems. Reinforcement learning models use learned errors and associate them with a given reward or penalty. This family of models can use either no training data and learn from experience, or use training data and learn from experience. Within these three areas of machine learning, there are many different types of data that could be used for training, including structured, unstructured, and semi-structured data. As the names suggest, structured data are data that have clearly defined patterns and data types, where unstructured data does not. Structured data is highly organized and easily searchable, usually residing in relational databases. Examples of structured data include sales transactions, inventory, addresses, dates, stock information, etc. Unstructured data, often living in non-relational databases, are more difficult to pinpoint and are most often categorized as qualitative data. Examples of unstructured data include audio recordings, video, tweets, social media posts, satellite imagery, text files, etc. Depending on the machine learning application, both structured and unstructured data can be used as training data.
训练数据应用于机器学习领域的三种任务类型:监督型,半监督型和无监督型。在监督学习中,必须对训练数据进行标记。这允该许模型学习从标签到其关联特征的映射。在无监督学习中,在无明确标签情况下了解数据内部结构。无监督机器学习模型在训练集的特征中寻找底层结构来进行广义分组或预测。在半监督学习问题中,半监督训练数据集将标记数据和未标记数据相结合。 强化学习模型利用学习到的错误将其与给予的奖励或惩罚相关联。该模型系列既可以不使用训练数据并从经验中学习,也可以使用训练数据并从经验中学习。 在机器学习的这三个领域中,有许多不同类型的数据可用于培训,包括结构化、非结构化和半结构化数据。顾名思义,结构化数据是具有明确定义的模式和数据类型的数据,而非结构化数据数据没有固定结构数据。结构化数据具有高度的组织性和易于搜索性,通常驻留在关系数据库中。 结构化数据的示例包括销售交易,库存,地址,日期,库存信息等。非结构化数据通常存在于非关系数据库中,更难查明并且通常被归类为定性数据。非结构化数据的包括录音,视频,推文,社交媒体帖子,卫星图像,文本文件等。根据机器学习应用程序的不同,结构化数据和非结构化数据都可以用作训练数据。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文