A. Colin Cameron: Python for Regression
These notes are for data analysis using key
Python modules, notably statsmodels for statistics, scikit-learn
for machine learning and tensorflow and keras for neural nets.
Little knowledge of how to program in Python (a low-level
language) is required. Instead one needs to know the commands to
use the modules.
Rather than directly install Python it is convenient to install
Anaconda which also automatically installs key modules in the
base environment. For some analysis such
as neural nets you need to create a separate environment (see my
Anaconda notes).
Anaconda
Anaconda installs not only base Python
plus many packages (collections of modules), including the key
ones for econometrics.
- NumPy. Short for numbers in
Python. Array data types needed for statistics and data
analysis.
- pandas. Name derived from panel data. R type data
frames and data analysis tools.
- matplotlib. Static and dynamic data
visualizations.
- SciPy. Scientific library (optimization,
integration, eigenvalue problems, random number generators, ...)
- statsmodels. Statistics using pandas dataframes
(computations and models including standard regression models).
- scikit-learn (Sklearn). Machine learning and data
mining using NumPy arrays.
- TensorFlow. Deep learning such as neural
networks.
Commands for many of these modules have similar format to R
commands.
Regression Examples: OLS, Random
Forest, Neural Net
Python
simple OLS regression example
Python within Stata
You can also run
Python within Stata.
This can be particularly useful for setting up data in Stata
and then transferring data from Stata to Python.
Python within Stata random forest
example
Kevin Sheppard, Introduction to Python for
Econometrics, Statistics and Numerical Analysis: Fourth+
Edition
pdf at https://www.kevinsheppard.com/teaching/python/notes/
Wes McKinney, Python for Data Analysis: data wrangling with
pandas, NumPy, and Jupyter, Third edition.
html web version at https://wesmckinney.com/book/
and paperback at Amazon
Reference for machine learning with Python
Aurelien Geron, Hands-On Machine Learning with Scikit-Learn,
Keras & Tensor Flow, Third Edition Amazon
Scikit-learn website
has many examples. Much of the book is on neural nets.
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani,
Jonathan Taylor (2023), An Introduction to Statistical
Learning: With Applications in Python, Springer. https://www.statlearning.com/
A. Colin Cameron / UC-Davis Economics / http://www.econ.ucdavis.edu/faculty/cameron