U.C. Davis
      Department of Economics

A. Colin Cameron: Python for Regression

These notes are for data analysis using key Python modules, notably statsmodels for statistics, scikit-learn for machine learning and tensorflow and keras for neural nets. Little knowledge of how to program in Python (a low-level language) is required. Instead one needs to know the commands to use the modules.
Rather than directly install Python it is convenient to install Anaconda which also automatically installs key modules in the base environment.
For some analysis such as neural nets you need to create a separate environment (see my Anaconda notes).

Anaconda

Anaconda installs not only base Python plus many packages (collections of modules), including the key ones for econometrics.
   - NumPy. Short for numbers in Python. Array data types needed for statistics and data analysis.
   - pandas. Name derived from panel data. R type data frames and data analysis tools.
   - matplotlib. Static and dynamic data visualizations.
   - SciPy. Scientific library (optimization, integration, eigenvalue problems, random number generators, ...)
   - statsmodels. Statistics using pandas dataframes (computations and models including standard regression models).
   - scikit-learn (Sklearn). Machine learning and data mining using NumPy arrays.
   - TensorFlow. Deep learning such as neural networks.
Commands for many of these modules have similar format to R commands.


Once Anaconda is installed you can run a Python program from within Anaconda.

For example, use the Spyder GUI interface (which is simlar to R Studio for R).
Or use the command shell.

Install and use Anaconda

Regression Examples: OLS, Random Forest, Neural Net

Some Python coding tips

Python simple OLS regression example

carsdata.csv
carsdata.dta

Python Random Forest Example using SciKit Learn module RandomForestRegressor
Data for
Random Forest Example

Python Neural Net Example using Tensorflow Keras 

Python within Stata

You can also run Python within Stata.
This can be particularly useful for setting up data in Stata and then transferring data from Stata to Python.

Link Stata to Python 

Python within Stata random forest example

References useful for econometrics with Python

Kevin Sheppard, Introduction to Python for Econometrics, Statistics and Numerical Analysis: Fourth+ Edition
pdf at  https://www.kevinsheppard.com/teaching/python/notes/ 
Wes McKinney, Python for Data Analysis: data wrangling with pandas, NumPy, and Jupyter, Third edition.
html web version at https://wesmckinney.com/book/  and paperback at Amazon

Reference for machine learning with Python
Aurelien Geron, Hands-On Machine Learning with Scikit-Learn, Keras & Tensor Flow, Third Edition Amazon
Scikit-learn website has many examples. Much of the book is on neural nets.
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor (2023), An Introduction to Statistical Learning: With Applications in Python, Springer. https://www.statlearning.com/


A. Colin Cameron / UC-Davis Economics /  http://www.econ.ucdavis.edu/faculty/cameron