Department of Economics, University of California - Davis  Fall 2017


A 35 minute presentation October 28 2017.

resented over two seminars at University of Sydney April 2017.
ased on class notes for ECN 240F in Spring 2016, in turn based on the two statistical learning books by Hastie, Tibsharani and coauthors.
Abstract: These slides attempt to explain machine learning to empirical economists familiar with regression methods. The slides cover standard machine learning methods for prediction such as k-fold cross-validation, lasso, regression trees and random forests. The slides conclude with some recent econometrics research that incorporates machine learning methods in causal models estimated using observational data, specifically (1) IV with many instruments, (2) OLS in the partial linear model with many controls, and (3) ATE in heterogeneous effects model with many controls.


For statistical learning the main text used in 240F is an undergraduate / masters level book
ISL: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani (2013), An Introduction to Statistical Learning: with Applications in R, Springer.
A free legal pdf is at and a $25 hardcopy can be obtained via

Supplementary material on statistical learning came from the Ph.D. level book
ESL: Trevor Hastie, Robert Tibsharani and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
A free legal pdf is at and a $25 hardcopy can be obtained via

A new book that ia good but I haven't used is
Bradley Efron and Trevor Hastie (2016)
Computer Age Statistical Inference: Algorithms, Evidence and Data Science,  Cambridge University Press.


Bringing established machine learning methods into econometrics is currently an active area. The literature focuses on valid statistical inference controlling for fist-stage data mining, and causal inference. Leading econometricians include
Victor Chernozhukov
Alex Belloni
Christian Hansen
Susan Athey
Guido Imbens


Coursera has many courses

REFERENCES FOR 240F Spring 2016 + one newer

This is a very active area: All the papers below were published in 2012 or later.

Partial Survey focused on using LASSO: A. Belloni, V. Chernozhukov and C. Hansen: 54. "High-Dimensional Methods and Inference on Treatment and Structural Effects in Economics, " J. Economic Perspectives Spring 2014, pp.29-50 with
Stata and Matlab programs here; and Stata replication code here

Lasso and IV: A. Belloni, V. Chernozhukov, D. Chen, and C. Hansen. "Sparse Models and Methods for Instrumental Regression, with an Application to Eminent Domain", Arxiv 2010, Econometrica 2012, pp.2369-2429.

Lasso and control function: A. Belloni, V. Chernozhukov and C. Hansen: "Inference on Treatment Effects After Selection Among High-Dimensional Controls," The Review of Economic Studies 2014, p.608-650.

Lasso and Propensity score weighting: M. Farrell, "Robust Inference on Average Treatment effects with possibly more Covariates than Observations," Journal of Econometrics, 2015, vol.189, pp.1-23.

H. Varian Big Data: New Tricks for Econometrics J. Economic Perspectives Spring 2014, pp. 3-28.
Dataset can be obtained from

S. Mullainathan and J. Spiess :Machine Learning: AN Applied Econometric Approach",
J. Economic Perspectives Spring 2017, pp. 87-106.

Other papers by Chernozhukov and coauthors on this topic are at

G. Imbens and S. Athey
"Machine Learning Methods for Estimating Heterogeneous Causal Effects"

Brief overview paper by S. Athey "Machine Learning and Causal Inference for Policy Evaluation"

Other papers by Athey are at