[/r/Economics Article of the Week]Machine Learning: An Applied Econometric Approach

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Author Summary

Ponderay is in r/Economics Article of the Week

Post Body

This week we look at the rise of Machine Learning in economics with Machine Learning: An Applied Econometric Approach by Sendhil Mullainathan and Jann Spiess

Our summary is written by /u/say_wot_again:

There's a lot of confusion over what machine learning is, with misunderstandings ranging from utopians certain that the age of general artificial intelligence is upon us to skeptics dismissing ML as nothing more than a fancier, overhyped name for OLS with constructed regressors. This paper gives a basic introduction to machine learning from the econometrician's perspective, explaining how basic principles of ML work and how it can be used in econometrics. While much of econometrics focuses on causal inference and coefficient estimation, ML (at least the subfield of ML most relevant for metrics, "supervised learning" with fully labeled data) is focused entirely on prediction. Gone are the worrying about data generating processes and functional forms, the endless quests for exogenous variation, even the standard errors on coefficients. Instead, by not caring about the functional form or the informativeness of the coefficients, ML methods are able to combine and recombine features to find far more complex relationships hidden in the data. This is best exemplified by decision trees, which can find nonlinear interactions between variables, and by deep learning, which extract higher level features out of high-dimensional data like images and, if the model is big enough, can approximate any function.

However, with great expressive power comes great responsibility, as a powerful enough model runs the risk of overfitting to the random quirks and noise of its training data set without learning relationships that generalize to the out of sample data that ultimately matters. This is known as the bias-variance tradeoff: models that fit the sample data very well have low error and thus low bias, but they overreact to small deviations in the training data and thus have high variance. Two major ways of dealing with this are through regularization and ensembling. In regularization, you place limits or penalties on the complexity (and thus the expressiveness) of the model. One example that may be familiar is LASSO, which penalizes the sum of the absolute values of the coefficients and thus promotes sparse models that only use a small subset of the features. In ensembling, you average lots of small models (typically trained on different subsets of the data and/or different subsets of features) with the understand that while each model may have its own elements of overfitting, those idiosyncracies should cancel out and the ensemble should just contain signal that truly generalizes.

But if ML focuses on prediction while most of econometrics cares about estimating coefficients, how can ML be used in econometrics? Well, one relatively basic way is by providing new data. For example, deep learning can extract new data out of images and text, such as crop yield estimates or sentiment indicators. In such cases, the economist can act simply as a passive consumer of the data. In other cases, parts of the econometric problem can be reduced to prediction problems and thus can be augmented by ML techniques. For example, instrumental variables attempt to generate exogenous variation by predicting a potentially endogenous regressor from exogenous "instruments" that only affect the dependent variable via the regressor. For the most part, you only care about the quality of the prediction of the regressor, so techniques from ML (especially regularization to ensure out of sample predictions remain good) can be helpful. Similarly, evaluating random experiments can be reformulated as a problem of predicting whether an individual received the treatment either based on pre-treatment features (indicating poor randomization) or based on outcomes (indicating meaningful impact of treatment on outcomes). Several aspects of public policy rely heavily on prediction of outcomes, in which ML can be a useful tool. And finally, some economic theories directly concern prediction (e.g. the efficient markets hypothesis states that excess returns cannot be predicted in advance); the better predictive powers of ML can help test those.

Author

Account Strength

100%

Account Age

13 years

Verified Email

Yes

Verified Flair

Total Karma

38,661

Link Karma

3,840

Comment Karma

34,803

Profile updated: 3 days ago

Posts updated: 6 months ago

Ponderay

Bureau Member

Subreddit

r/Economics

Post Details

Location

r/Economics Article of the Week

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 7 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/Economics/c...