MACHINE LEARNING or STATISTICAL LEARNING
Colin Cameron, Department of Economics,
University of California - Davis October 2023
Machine learning methods for
prediction are well-established in the statistical and
computer science literature.
Applying machine learning methods for causal influence is a
very active area in the economics literature.
A summary such as that in the slides below can become dated
very quickly.
SLIDES: MACHINE LEARNING VERY BRIEF OVERVIEW 2023
This 29 slide overview was presented October
2023
machlearn2019_Intro_very_brief.pdf
BOOK CHAPTER: 2022
Chapter 28 in A. Colin Cameron and Pravin K.
Trivedi, Microeconometrics using Stata: Volume 2 Nonlinear
Models and Causal Inference Methods covers Machine
Learning Methods for Prediction and for Causal Inference. Click here
for book information.
Stata mostly uses the Lasso, ridge regression and
elastic net. This is enough to provide a good introduction to
machine learning methods. Additionally Stata has some built-in
commands for causal inference using the LASSO in the partial
linear model and the standard binary treatment effects model.
For other machine learners such as neural nets
and random forests it is standard to use packages in Python or
R.
SHORT COURSE: 2024
In Spring 2024 I spent five weeks on machine
learning in my ECN 240F class.
The slides are an updated version of eight hours of accelerated
lectures on machine learning for econometrics I gave in May 2022
at Simon Fraser University.
Click here
for course slides (updated to 2024), programs and data sets.
SLIDES: MACHINE LEARNING BRIEF OVERVIEW 2019
This 60 slide overview was presented June 2019
machlearn2019_Intro_brief.pdf
SLIDES: CAUSAL MACHINE LEARNING FOR ECONOMICS BRIEF OVERVIEW
2020
This 20 slide introduction to casual inference for
the partial linear model using the LASSO was presented January
2020
machlearn2020_Causal_Intro_brief.pdf
USEFUL TEXTS FOR MACHINE LEARNING (NOT ECONOMICS)
For statistical learning a leading text is the undergraduate
/ masters level book
ISL2: Gareth James, Daniela Witten, Trevor Hastie and
Robert Tibshirani (2021), An Introduction to Statistical
Learning: with Applications in R, Second Edition,
Springer.
A free legal
pdf is at https://www.statlearning.com/
A Python
version of this book is also available.
ISLP: Garetha James, Daniela Witten, Trevor
Hastie, Robert Tibsharani and Jonathan Taylor,
(2023), An Introduction to Statistical Learning:
With Applications in Python, Springer.
A
free legal pdf is at https://www.statlearning.com/
Supplementary material on statistical learning is in the
Ph.D. level book
ESL: Trevor Hastie, Robert Tibsharani and Jerome
Friedman (2009), The Elements of Statistical Learning: Data
Mining, Inference and Prediction, Springer.
A free legal pdf is at
http://statweb.stanford.edu/~tibs/ElemStatLearn/index.html
and a $25 hardcopy can be obtained via
http://www.springer.com/gp/products/books/mycopy
Another book that is good but I haven't used is
Bradley Efron and Trevor Hastie (2016) Computer Age
Statistical Inference: Algorithms, Evidence and Data Science,
Cambridge University Press.
USEFUL TEXTS FOR MACHINE LEARNING (FOR ECONOMICS)
The following book is more recent and includes some causal methods
Matt Taddy (2019), Business Data
Science: Combining Machine Learning and Economics to Optimize,
Automate, and Accelerate Business Decisions, McGraw-Hill.
LEADERS IN ECONOMETRICS
Bringing established machine learning methods into
econometrics is currently an active area. The literature focuses
on valid statistical inference controlling for first-stage data
mining, and causal inference. Leading econometricians include
Victor Chernozhukov
http://web.mit.edu/~vchern/www/
https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Alex Belloni https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Christian Hansen http://faculty.chicagobooth.edu/christian.hansen/research/
Susan Athey https://www.gsb.stanford.edu/faculty-research/faculty/susan-athey
https://people.stanford.edu/athey/research
Guido Imbens https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens
https://people.stanford.edu/imbens/publications
ONLINE COURSES
Coursera has many courses https://www.coursera.org/browse/data-science/machine-learning?languages=en
SOME ECONOMICS REFERENCES
This is a very active area. The papers listed below were
published between 2011 and 2019.
Machine learning prediction in economics
Hal Varian (2014), "Big Data: New Tricks for
Econometrics", Journal of Economic Perspectives, Spring, 3-28.
Sendhil Mullainathan and J. Spiess: "Machine Learning: An
Applied Econometric Approach", Journal of Economic Perspectives,
Spring 2017, 87-106.
Jon Kleinberg, H. Lakkaraju, Jure Leskovec, Jens Ludwig, Sendhil
Mullainathan (2018), "Human Decisions and Machine Predictions",
Quarterly Journal of Economics, 237-293.
Surveys of causal inference in economics
Susan Athey (2018), "The Impact of Machine Learning on
Economics". http://www.nber.org/chapters/c14009.pdf
Susan Athey and Guido Imbens (2019), "Machine Learning Methods
Economists Should Know About."
Alex Belloni, Victor Chernozhukov and Christian Hansen (2014),
"High-dimensional methods and inference on structural and
treatment effects," Journal of Economic Perspectives, Spring,
29-50.
Causal inference in economics
Alex Belloni, Victor Chernozhukov and Christian Hansen
(2011), "Inference Methods for High-Dimensional Sparse Econometric
Models," Advances in Economics and Econometrics, ES World Congress
2010, ArXiv 2011.
Alex Belloni, D. Chen, Victor Chernozhukov and Christian Hansen
(2012), "Sparse Models and Methods for Optimal Instruments with an
Application to Eminent Domain", Econometrica, Vol. 80, 2369-2429.
Alex Belloni, Victor Chernozhukov, Ivan Fernandez-Val and
Christian Hansen (2017), "Program Evaluation and Causal Inference
with High-Dimensional Data," Econometrica, 233-299.
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther
Duflo, Christian Hansen, Whitney Newey and James Robins (2018),
"Double/debiased machine learning for treatment and structural
parameters," The Econometrics Journal, 21, C1-C68.
Max Farrell (2015), "Robust Estimation of Average Treatment Effect
with Possibly more Covariates than Observations", Journal of
Econometrics, 189, 1-23.
Max Farrell, Tengyuan Liang and Sanjog Misra (2018), "Deep Neural
Networks for Estimation and Inference: Application to Causal
Effects and Other Semiparametric Estimands," arXiv:1809.09953v2.
Stefan Wager and Susan Athey (2018), "Estimation and Inference of
Heterogeneous Treatment Effects using Random Forests," JASA,
1228-1242.
Stata Software
Stata version 16
introduced commands for lasso, ridge, elasticnet and casual
inference in the partial linear and related models with
exogenous or endogenous regressors.
Python Software
In Spring 2023 I used
Python for machine learning at a very introductory level.
Click here
for material on getting going with Python and sci-kit learn.
A. Colin Cameron / UC-Davis Economics / http://www.econ.ucdavis.edu/faculty/cameron