MACHINE LEARNING or STATISTICAL LEARNING
Based on (Econ/ARE 240F)

Department of Economics, University of California - Davis  Spring 2016

SLIDES: MACHINE LEARNING FOR MICROECONOMETRICS

trmachinelearningseminar.pdf  
Based on class notes for ECN 240F in Spring 2016, in turn based on the two statistical learning books by Hastie, Tibsharani and coauthors.
Then presented over two seminars at University of Sydney April 2017.
Abstract: These slides attempt to explain machine learning to empirical economists familiar with regression methods. The slides cover standard machine learning methods for prediction such as k-fold cross-validation, lasso, regression trees and random forests. The slides conclude with some recent econometrics research that incorporates machine learning methods in causal models estimated using observational data, specifically (1) IV with many instruments, (2) OLS in the partial linear model with many controls, and (3) ATE in heterogeneous effects model with many controls.

USEFUL TEXTS

For statistical learning the main text used in 240F is an undergraduate / masters level book
ISL: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani (2013), An Introduction to Statistical Learning: with Applications in R, Springer.
A free legal pdf is at http://www-bcf.usc.edu/~gareth/ISL/ and a $25 hardcopy can be obtained via http://www.springer.com/gp/products/books/mycopy

Supplementary material on statistical learning came from the Ph.D. level book
ESL: Trevor Hastie, Robert Tibsharani and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
A free legal pdf is at http://statweb.stanford.edu/~tibs/ElemStatLearn/index.html and a $25 hardcopy can be obtained via http://www.springer.com/gp/products/books/mycopy

A new book that will be good but I haven't used is
Bradley Efron and Trevor Hastie (2016)
Computer Age Statistical Inference: Algorithms, Evidence and Data Science,  Cambridge University Press.

LEADERS IN ECONOMETRICS

Bringing established machine learning methods into econometrics is currently an active area. The literature focuses on valid statistical inference controlling for fist-stage data mining, and causal inference. Leading econometricians include
Victor Chernozhukov    http://web.mit.edu/~vchern/www/
https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Alex Belloni https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Christian Hansen http://faculty.chicagobooth.edu/christian.hansen/research/
Susan Athey  https://www.gsb.stanford.edu/faculty-research/faculty/susan-athey   https://people.stanford.edu/athey/research
Guido Imbens    https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens  https://people.stanford.edu/imbens/publications

ONLINE COURSES

Coursera has many courses   https://www.coursera.org/browse/data-science/machine-learning?languages=en

REFERENCES FOR 240F Spring 2016

This is a very active area: All the papers below were published in 2012 or later.

Partial Survey focused on using LASSO: A. Belloni, V. Chernozhukov and C. Hansen: 54. "High-Dimensional Methods and Inference on Treatment and Structural Effects in Economics, " J. Economic Perspectives Spring 2014, pp.29-50 with
Stata and Matlab programs here; and Stata replication code here

Lasso and IV: A. Belloni, V. Chernozhukov, D. Chen, and C. Hansen. "Sparse Models and Methods for Instrumental Regression, with an Application to Eminent Domain", Arxiv 2010, Econometrica 2012, pp.2369-2429.

Lasso and control function: A. Belloni, V. Chernozhukov and C. Hansen: "Inference on Treatment Effects After Selection Among High-Dimensional Controls," The Review of Economic Studies 2014, p.608-650.

Lasso and Propensity score weighting: M. Farrell, "Robust Inference on Average Treatment effects with possibly more Covariates than Observations," Journal of Econometrics, 2015, vol.189, pp.1-23.

H. Varian Big Data: New Tricks for Econometrics J. Economic Perspectives Spring 2014, pp. 3-28.
Dataset can be obtained from https://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2

Other papers by Chernozhukov and coauthors on this topic are at http://www.mit.edu/~vchern/#veryhigh

G. Imbens and S. Athey
"Machine Learning Methods for Estimating Heterogeneous Causal Effects"

Brief overview paper by S. Athey "Machine Learning and Causal Inference for Policy Evaluation" http://faculty-gsb.stanford.edu/athey/documents/AtheyKDDfinal.pdf


Other papers by Athey are at http://faculty-gsb.stanford.edu/athey/research.html#Econometric_Theory_%28Identification_and_E