A. Colin Cameron: Python for Regression
These notes are for data analysis using key
Python modules, notably statsmodels for statistics and
scikit-learn for machine learning. This requires little
knowledge of how to program in Python (a low-level language).
Instead one needs to know the commands to use the modules.
Rather than directly install Python it is convenient to install Anaconda which also automatically installs key modules.
Anaconda installs not only base Python
plus many packages (collections of modules), including the key
ones for econometrics.
- NumPy. Short for numbers in Python. Array data types needed for statistics and data analysis.
- pandas. Name derived from panel data. R type data frames and data analysis tools.
- matplotlib. Static and dynamic data visualizations.
- SciPy. Scientific library (optimization, integration, eigenvalue problems, random number generators, ...)
- statsmodels. Statistics using pandas dataframes (computations and models including standard regression models).
- scikit-learn (Sklearn). Machine learning and data mining using NumPy arrays.
- TensorFlow. Deep learning such as neural networks.
Commands for many of these modules have similar format to R commands.
You can also run Python within Stata.
This can be particularly useful for setting up data in Stata and then transferring data from Stata to Python.
and use Anaconda
Link Stata to Python
Some Python coding tips
Python within Stata example
References useful for econometrics with Python
Kevin Sheppard "Introduction to Python for Econometrics, Statsitcs and Nummerical Analysis: Fourth+ Edition"
pdf at https://www.kevinsheppard.com/teaching/python/notes/
Wes McKinney "Python for Data Analysis, Data Wrangling with pandas, NumPy, and Jupyter", Third edition.
html web version at https://wesmckinney.com/book/ and paperback at Amazon
A. Colin Cameron / UC-Davis Economics / http://www.econ.ucdavis.edu/faculty/cameron