MACHINE LEARNING or STATISTICAL LEARNING
Colin Cameron, Department of Economics,
University of California - Davis June 2019
Machine learning methods for
prediction are well-established in the statistical and
computer science literature.
Applying machine learning methods for causal influence is a
very active area in the economics literature.
A summary such as that in the slides below can become dated
very quickly.
SLIDES: MACHINE LEARNING BRIEF OVERVIEW
This 60 slide overview was presented June 2019
machlearn2019_Intro_brief.pdf
SLIDES: CAUSAL MACHINE LEARNING FOR ECONOMICS BRIEF
OVERVIEW
This 20 slide introduction to casual inference for
the partial linear model using the LASSO was presented January
2020
machlearn2020_Causal_Intro_brief.pdf
SLIDES: MORE DETAIL ON MACHINE LEARNING IN GENERAL
The following two sets of slides provide much
more detail on basic machine learning methods.
They were created in April 2019 for short courses in Germany
machlearn2019_part1.pdf
(Basics: selection, shrinkage, dimension reduction, LASSO)
machlearn2019_part2.pdf
(Flexible methods: including random forests, classification and
cluster analysis)
SLIDES: MORE DETAIL ON MACHINE LEARNING FOR
ECONOMICS
The following set of slides
provides much more detail on use in economics of machine
learning methods.
These slides were created in April 2019 for short courses
in Germany and presentation at U.C. Riverside.
They cover a prediction example in economics and then various
methods for causal inference in the partially linear model and
in heterogeneous effects models.
The slides also list key references in the current economics
literature.
machlearn2019_Riverside_2.pdf
USEFUL TEXTS FOR MACHINE LEARNING (NOT ECONOMICS)
For statistical learning the main text used in 240F is an
undergraduate / masters level book
ISL: Gareth James, Daniela Witten, Trevor Hastie and
Robert Tibsharani (2013), An Introduction to Statistical
Learning: with Applications in R, Springer.
A free legal pdf is at http://www-bcf.usc.edu/~gareth/ISL/
and a $25 hardcopy can be obtained via
http://www.springer.com/gp/products/books/mycopy
Supplementary material on statistical learning came from the
Ph.D. level book
ESL: Trevor Hastie, Robert Tibsharani and Jerome
Friedman (2009), The Elements of Statistical Learning: Data
Mining, Inference and Prediction, Springer.
A free legal pdf is at
http://statweb.stanford.edu/~tibs/ElemStatLearn/index.html
and a $25 hardcopy can be obtained via
http://www.springer.com/gp/products/books/mycopy
A newer book that is good but I haven't used is
Bradley Efron and Trevor Hastie (2016) Computer Age
Statistical Inference: Algorithms, Evidence and Data Science,
Cambridge University Press.
USEFUL TEXTS FOR MACHINE LEARNING (FOR ECONOMICS)
The following book is more recent and includes some causal methods
Matt Taddy (2019), Business Data Science:
Combining Machine Learning and Economics to Optimize, Automate,
and Accelerate Business Decisions, McGraw-Hill.
LEADERS IN ECONOMETRICS
Bringing established machine learning methods into
econometrics is currently an active area. The literature focuses
on valid statistical inference controlling for first-stage data
mining, and causal inference. Leading econometricians include
Victor Chernozhukov
http://web.mit.edu/~vchern/www/
https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Alex Belloni https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Christian Hansen http://faculty.chicagobooth.edu/christian.hansen/research/
Susan Athey https://www.gsb.stanford.edu/faculty-research/faculty/susan-athey
https://people.stanford.edu/athey/research
Guido Imbens https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens
https://people.stanford.edu/imbens/publications
ONLINE COURSES
Coursera has many courses https://www.coursera.org/browse/data-science/machine-learning?languages=en
SOME ECONOMICS REFERENCES
This is a very active
area: All the papers below were published in 2011 or later.
Machine learning prediction in economics
Hal Varian (2014), "Big Data: New Tricks for
Econometrics", Journal of Economic Perspectives, Spring, 3-28.
Sendhil Mullainathan and J. Spiess: "Machine Learning: An
Applied Econometric Approach", Journal of Economic Perspectives,
Spring 2017, 87-106.
Jon Kleinberg, H. Lakkaraju, Jure Leskovec, Jens Ludwig, Sendhil
Mullainathan (2018), "Human Decisions and Machine Predictions",
Quarterly Journal of Economics, 237-293.
Surveys of causal inference in economics
Susan Athey (2018), "The Impact of Machine Learning on
Economics". http://www.nber.org/chapters/c14009.pdf
Susan Athey and Guido Imbens (2019), "Machine Learning Methods
Economists Should Know About."
Alex Belloni, Victor Chernozhukov and Christian Hansen (2014),
"High-dimensional methods and inference on structural and
treatment effects," Journal of Economic Perspectives, Spring,
29-50.
Causal inference in economics
Alex Belloni, Victor Chernozhukov and Christian Hansen
(2011), "Inference Methods for High-Dimensional Sparse Econometric
Models," Advances in Economics and Econometrics, ES World Congress
2010, ArXiv 2011.
Alex Belloni, D. Chen, Victor Chernozhukov and Christian Hansen
(2012), "Sparse Models and Methods for Optimal Instruments with an
Application to Eminent Domain", Econometrica, Vol. 80, 2369-2429.
Alex Belloni, Victor Chernozhukov, Ivan Fernandez-Val and
Christian Hansen (2017), "Program Evaluation and Causal Inference
with High-Dimensional Data," Econometrica, 233-299.
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther
Duflo, Christian Hansen, Whitney Newey and James Robins (2018),
"Double/debiased machine learning for treatment and structural
parameters," The Econometrics Journal, 21, C1-C68.
Max Farrell (2015), "Robust Estimation of Average Treatment Effect
with Possibly more Covariates than Observations", Journal of
Econometrics, 189, 1-23.
Max Farrell, Tengyuan Liang and Sanjog Misra (2018), "Deep Neural
Networks for Estimation and Inference: Application to Causal
Effects and Other Semiparametric Estimands," arXiv:1809.09953v2.
Stefan Wager and Susan Athey (2018), "Estimation and Inference of
Heterogeneous Treatment Effects using Random Forests," JASA,
1228-1242.
Stata Software
Stata version 16
introduced commands for lasso, ridge, elasticnet and casual
inference in the partial linear and related models with
exogenous or endogenous regressors.
The following Stata add-on will work with Stata 16 and also with
earlier versions of Stata
Achim Ahrens, Christian Hansen, Mark Schaffer (2019),
"lassopack: Model selection and prediction with regularized
regression in Stata," arXiv:1901.05397