EXCEL 97: Regression with Categorical Dependent Variable

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This September 1999 help sheet gives information on


CATEGORICAL VARIABLE TAKING ONE VALUE

Suppose we wish to explain what determines whether or not a person is employed.
Then the dependent variable Y is a categorical variable that takes value 1 if employed and 0 if not employed.
The regressors X are determinants such as age and years of schooling.

A crude approach is to use Tools | Regression Analysis in the usual way.
Then Y is viewed as being a measure of the probability of working or not,
and the regression coefficients give the change in this probability when X changes by one unit.
Weaknesses of this approach are that individual predictions Y_hat can be less than 0 or more than 1,
which should not happen for a probability, and that the wrong standard errors and t-statistics are reported.
This approach is called the linear probability model, and should only be used for exploratory data analysis.

The correct method for this data is to estimate a logit or probit model.
This is really not possible in Excel (unless one goes to considerable effort).
Better is to use statistics or econometrics packages such as SAS, SPSS, Eviews, Limdep or Stata.
 

CATEGORICAL VARIABLE TAKING MORE THAN ONE VALUE

Suppose we have data on three mutually exclusive levels of work force attachment:
out-of-the-labor force, unemployed and employed.
These data are categorical and might be coded 0, 1 and 2.

A crude approach is to use Tools | Regression Analysis in the usual way.
One of many weaknesses of this is the arbitrariness of the scaling of Y which treats movement from out-of-the-labor force to unemployed as the same as movement from unemployed to employed.
This approach should only be used for exploratory data analysis.

Similar problems arise in analyzing attitudinal data coded as excellent, good, fair, poor which might be coded as 4, 3, 2 and 1.

The correct method for this data is to estimate an ordered or multinomial logit or probit model.
This is not possible in Excel.
Better is to use a quite specialized package such as Limdep.
 

For further information on how to use Excel go to
     http://www.econ.ucdavis.edu/faculty/cameron