What is Training Data?

何为训练数据?

2021-10-04 21:25 TAUS

本文共257个字,阅读需3分钟

阅读模式 切换至中文

A machine learning algorithm uses data to learn and make decisions. The algorithm develops confidence in its decisions by understanding the underlying patterns, relationships, and structures within a training dataset. The higher quality the training data is, the better the algorithm will perform. So what is training data exactly? Training data, also referred to as a training set or learning set, is an input dataset used to train a machine learning model. These models use training data to learn and refine rules to make predictions on unseen data points. The volume of training data feeding into a model is often large, enabling algorithms to predict more accurate labels. Oftentimes, a training set consists of about 70-80% of your entire dataset. The structure of a training set consists of rows and columns, where one row is one observation, and one column is one feature. Features are also referred to as attributes, and they are extremely important to the outcome of a machine learning algorithm. For example, if we wanted to build a model that predicts the weather, some applicable features would be temperature, cloud coverage, and humidity. The values for each feature would be one observation, or row, in the dataset. It’s common and often necessary to have some sort of human involvement when using training data for a machine learning model. The training data must fit the business and model requirements. The data needs to be scrubbed and analyzed before it can be used in the model, otherwise, the quality of the predictions will be negatively impacted.
机器学习算法利用数据进行学习和决策。算法通过理解训练数据集中的基本模式、关系和结构来提高决策的置信度。训练数据质量越高,算法的表现就越好。那么,训练数据到底是什么呢? 训练数据,也称为训练集或学习集,是用于训练机器学习模型的输入数据集。这些模型利用训练数据进行学习,提炼规则,对看不见的数据点做出预测。输入模型的训练数据量往往很大,使得算法能够预测更准确的标签。通常,一个训练集由整个数据集的大约70-80%组成。训练集的结构由行和列组成,其中一行是一个观察,一列是一个特征。特征也被称为属性,它们对机器学习算法的结果极其重要。例如,如果我们想要建立一个预测天气的模型,一些适用的特征将是温度,云覆盖和湿度。每个特征的值将是数据集中的一个观察值或行。 在使用机器学习模型的训练数据时,有某种人类参与是很常见的,而且经常是必要的。训练数据必须符合业务和模型要求。在模型中使用这些数据之前,需要对其进行擦洗和分析,否则,预测的质量将受到负面影响。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文