Supervised Learning:
In supervised learning, a training dataset is provided with input variables (X) and output variables (Y). The goal is to learn a function (Y = f(X)) that predicts the output given an input. The focus is on prediction accuracy rather than understanding the true relationship between variables. To evaluate accuracy, a separate testing dataset is used. Model selection is done using a training subset and validation subset. The chosen model is then evaluated using the testing dataset. Cross-validation is also commonly used.
Unsupervised Learning:
In unsupervised learning, the training data only has input variables and the output variables are unknown. The goal is to find hidden patterns or extract information from the data. Tasks include clustering analysis, dimensionality reduction, and feature learning.
Supervised and unsupervised learning can sometimes overlap, leading to other learning tasks. In semi-supervised learning, only a subset of the data has known outputs, but the rest of the data is still used to improve learning. Transfer learning involves using an existing model, trained on a different dataset or for a different purpose, as a starting point for a new task. This can be useful for models that require a lot of training data and time. For example, in image analysis, existing models can be updated with specific images from a research project.
Active Learning:
In active learning, limited training instances are initially available. The algorithm can acquire more instances to improve accuracy, but it's expensive. The focus is on determining important training instances.
In reinforcement learning, the agent interacts with the environment to optimize an objective function. This is often formulated as a Markov decision process. The algorithm learns the environment's characteristics and crafts optimal actions given the states. This type of learning has gained attention due to advancements and industry applications.
Machine Learning Methods:
Various machine learning techniques are used across different disciplines, such as linear regression and logistic regression in machine learning and other fields like marketing. Although the focus may differ, the underlying technical principles remain the same. However, there are other machine learning methods that are not commonly used in marketing research, which we will briefly discuss. In this vast field, we will only cover the commonly used supervised, unsupervised, and reinforcement learning methods that are relevant to marketing research, with a focus on recent advancements.