The first key strength of machine learning methods is that they can readily handle unstructured data such as text, image, audio, and video, and can process data with complex structures such as large scale network or tracking data. In addition, machine learning methods can accommodate data of hybrid formats, such as a combination of text, image, and structured data, in an integrated manner. Unstructured data have been the key driver of the data explosion in recent years, elevating the importance of machine learning methods.
Machine learning methods can handle larger data volume compared to econometric models. Econometric models typically use data from a few hundred or thousand consumers with limited variables and choices. Machine learning research, on the other hand, uses millions of observations and larger datasets. Optimization algorithms and high-performance computing make it easy to implement these methods. As the data volume in the real world continues to grow, scalable methods are in high demand.
Machine learning methods have several advantages, including flexibility in input construction through feature engineering. Unlike econometric models, machine learning allows for extensive upfront efforts in creating and transforming input variables, including using different forms and adding interaction terms. This flexibility is also reflected in the model structure, as machine learning methods can carve out arbitrary regions of the feature space or perform complex transformations. This combination of feature engineering and flexible model structure increases the chance of capturing the true linkage between input and output variables. Additionally, machine learning methods excel in prediction, especially in real-world settings, where their out-of-sample predictive accuracy is evaluated. Machine learning methods are often used in data science contests and open data competitions, where their ability to predict accurately is crucial.
The main limitation of machine learning is its lack of interpretability. Unlike econometric models, which have transparent structures and clear links between variables, machine learning models are like black boxes that prioritize predictive accuracy over interpretive insights. Machine learning methods rely on engineered features and flexible model structures, making it difficult to perform statistical hypothesis testing or evaluate the relationships between variables. However, there are some mitigating factors, such as machine learning methods with interpretable parameters and post-hoc interpretation techniques that can still provide valuable insights.