EXCEL 97: Two-Variable Regression Using Tools | Data Analysis

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This September 1999 help sheet gives information on how to This handout is the first place to go to for Two-variable regression output.
After reading this move onto Excel: Statistical Inference for Two-Variable Regression

Other ways to do two-variable regression are discussed
in Excel: Two-way Plots in the section on Add a trendline and
in Excel: Two Variable Regression using Excel Functions.
 

INPUT DATA

The data used are in number of cars per household (Y) and household size (X).
The spreadsheet cells A1:B6 should look like:
 
cars  hh size
   1    1
   2    2
   2    3
   2    4
   3    5

RUN THE REGRESSION

The population regression model is

      y = b1 + b2*x + u

where the error term u has mean 0 and variance sigma-squared.

We wish to estimate the regression line

     y = b1 + b2*x

Do this by Tools / Data Analysis / Regression.
A relatively simple form of the command (with labels and line plot) is
 
Input Y Range a1:a6
Input X Range b1:b6
Labels Tick
Output range a8
Line Plot Tick

and then hit OK.
 

REGRESSION SUMMARY OUTPUT

There is quite a lot of regression output produced. See end for all the output.

Regression statistics
 
Explanation
Multiple R 0.894427 R = square root of R^2
R Square 0.8 R^2 = coefficient of determination
Adjusted R Square 0.733333 Adjusted R^2 used if more than one x variable
Standard Error 0.365148 This is the sample estimate of the st. dev. of the error u
Observations 5 Number of observations used in the regression (n)

Anova
 
df SS MS F Signifiance F
Regression 1 1.6 1.6 12 0.04519
Residual 3 0.4 0.133333
Total 4 2.0

The regression output of most interest is the following
 
Coefficient St. error t Stat P-value Lower 95% Upper 95%
Intercept      0.8 0.38297 2.089 0.1279 -0.4188 2.0188
hh size      0.4 0.11547 3.464 0.0405 0.0325 0.7675

INTERPRET THE REGRESSION SUMMARY OUTPUT

Let b1 denote the population coefficient of the intercept and b2 the population coefficient of hh size.
The column "Coefficient" gives the least squares estimates of b1 and b2.

Thus the fitted line is

     y = 0.8 + 0.4*x
or
    cars = 0.8 + 0.4*hhsize

This is the key output we are interested in.
 

FITTED VALUES AND RESIDUALS FROM REGRESSION LINE

Fitted values and residuals from the regression line
 
y = cars x = hh size yhat = 0.8+0.4*x e = y - yhat
   1    1     1.2    -.2
   2    2     1.6    0.4
   2    3     2.0    0.0
   2    4     2.4    -.4
   3    5     2.8    0.2

PREDICTED VALUE OF Y GIVEN X = 4

yhat = b1 + b2*x = 0.8 + 0.4*4 = 2.4.

This is also the predicted value of the conditional mean of y given x=4.
 

COMPLETE EXCEL OUTPUT FOR THIS EXERCISE

This output is discussed in much greater detail in: Excel: Statistical Inference for Two-Variable Regression
 

For further information on how to use Excel go to
     http://www.econ.ucdavis.edu/faculty/cameron