EXCEL 97: Difference in Means and Variances

EXCEL 97: Difference in Two Means

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This September 1999 help sheet gives information on how to

Test for difference between two paired means
Test for difference between two unpaired means with equal variances assumed
Test for difference between two unpaired means with unequal variances assumed
Test for difference between two unpaired means with known equal variance

The first two are the most common.
These tests can be done using descriptive statistics (for two paired means) and regression analysis (for unpaired means with equal variance).
So no special treatment of difference in means is needed. For completeness we nonetheless present the topic here.

The tests are valid for large samples from any distribution or small samples from the normal.
For small samples from nonnormal distributions, one should instead use the Wilcoxon rank sum test for differences in two medians. This is not automatically calculated by Excel, and you will neeed to see a textbook for how to do the Wilcoxon test in Excel.

THEORY

Tests for difference in two means for two related sets of measures on continuous data.

Let y1, y2, .... be one data series with mean mux.
Let x1, x2, .... be a second data series with mean muy.

Then we consider
two-sided test:   H0: muy - mux = 0 versus Ha: muy - mux not equal to 0
one-sided test:   H0: muy - mux <= 0 versus Ha: muy - mux > 0
or:                    H0: muy - mux >= 0 versus Ha: muy - mux < 0

There are several different tests:

The data are paired if the data are two measures on the same object.

The data are unpaired if the data are two sets of measures measures on the same object.

If data are not paired distinction is made between tests that where equal variances are assumed and tests where unequal variances are assumed.
If data are not paired distinction sometimes it is assumed that there are known equal variances.

These various tests can be implemented in three ways, all of which give the same answer:

Basic tests can be implemented using the TTEST commands
More detailed tests can be implemented using Tools | Data Analysis
All but the unpaired unequal variances can be implemented using a regression or descriptive statistics.

All tests assume independent samples and normally distributed data but by the central limit theorem can also be used for nonnormal data if for each of the two paired samples there are more than 30 observations.

In analysis of economics data the favored tool is regression, and would be used here.
For completeness we give all three methods as many other disciplines use non-regression methods for comparing two means.
We focus on a two-sided test that the means are the same.

INPUT DATA

The data used are house price data. For the same house we have the list price and the sale price.
This is a paired sample, so that is the only correct method for these data.
For demonstration purposes we will also use this data for unpaired samples.

The data are first list price and then sale price for 29 houses.
Sale Price
386.0 375.0
369.9 340.0
315.0 310.0
279.9 279.9
280.0 278.5
273.0 273.0
279.5 272.0
269.0 270.0
263.9 270.0
259.9 258.5
240.0 255.0
269.0 253.0
235.0 249.0
252.0 245.0
249.0 244.0
242.5 241.0
239.5 239.5
249.0 238.0
239.5 236.5
242.0 235.0
245.0 235.0
244.5 233.0
239.0 230.0
229.5 229.0
224.5 224.5
229.0 220.0
219.9 213.0
212.0 212.0
199.9 204.0

Read the labels and data into cells A1:A30.

TEST FOR DIFFERENCE BETWEEN TWO PAIRED MEANS

Using the TTEST command for paired means (type = 1) with two tails yields
TTEST(A1:A30,C1:C30,2,1) = 0.021175
[With one tail the p-value is 0.010587]

So we reject H0 at 5 percent significance level as p-value = 0.021175 is < .05.
Conclude that means are unequal.

Using Tools | Data Analysis | t-test: Paired Two Sample for Mean
allows one to test for a difference other than 0 (but here do put in 0)
and yields more complete output:
                         List Price       Sale Price
Mean                       257.8           253.9
Variance                   1669.5         1398.0
Observations                 29              29
Pearson Correlation         0.979
Hypothesized Mean Difference 0
df                          28
t Stat                    2.44206
P(T<=t) one-tail          0.010587582
t Critical one-tail       1.701130259
P(T<=t) two-tail          0.021175163
t Critical two-tail       2.048409442

Using Tools | Data Analysis | Descriptive Statistics
Create in cells C1:C30 a header and the series List - Sale (e.g. C2=A2-B2).
Then Tools | Descriptive Statistics yields
Mean = 3.913793 and Standard error 1.602659
This yields p-value = TDIST(3.913793/1.602659,28,2) = 0.021175.

Using Tools | Data Analysis | Regression is also possible.
Create a column of ones.
Regress List - Sale on this column of ones selecting the No Constant option.
This yields p-value for the regressor of 0.021175163.

TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: EQUAL VARIANCES

Using the TTEST command for two-sample equal variance (type = 2) with two tails yields
TTEST(A1:A30,C1:C30,2,2) = 0.704987
[With one tail the p-value is 0.352493]

So we do not reject H0 at 5 percent significance level as p-value = 0.704987 which is not < .05.
We conclude the series are the same.

Using Tools | Data Analysis | t-test: Two-Sample assuming Equal Variance
allows one to test for a difference other than 0 (but here do put in 0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.704987

Using Tools | Data Analysis | Regression:
Create in cells D1:C59 a header for Price and a combined series List and Sale of 58 observations.
Create in cells E1:E59 a header for DSale and a series = 1 if price is sale price and = 0 if price is list price
Regress D1:D59 on a constant and E1:E59 and do a t-test of whether the coefficient on DSale is zero.
This yields p-value for the regressor of 0.704987.

[ASIDE: This conclusion of this test differs from the conclusion for paired test.
The reason is that in general Var(List-Sale) = Var(List) + Var(Sale) - 2*Cov(List, Sale).
This test assumes independence so that 2*Cov(List, Sale) is (erroneously) set to 0.
Then Var(List-Sale) = Var(List) + Var(Sale) = 1533.8 + 1533.8 = 3067.6
where 1533.8 is the pooled variance estimate.
The paired test uses the correct sample Var(List-Sale) = 74.48, which is much smaller.]

TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: UNEQUAL VARIANCES

Using the TTEST command for two-sample unequal variance (type = 3) with two tails yields
TTEST(A1:A30,C1:C30,2,3) = 0.704998
[With one tail the p-value is 0.352499]

So we do not reject H0 at 5 percent significance level as p-value = 0.704987 which is not < .05.
We conclude the series are the same.

Using Tools | Data Analysis | t-test: Two-Sample assuming Unequal Variance
allows one to test for a difference other than 0 (but here do put in 0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.704987
This seems to be an error! At least in my version of Excel. It is exactly the same as for the equal variance case.
Error appears to be in construction of the pooled variance.

Using Tools | Data Analysis | Regression is not straightforward.
We would use an F-test to compare running one regression for all 58 observations with just ones as a regressor
with two separate regressions for the 29 observations each with one regressor.

The test statistics and p-value is very similar to that for equal variance as the two variances are similar.

[ASIDE: This conclusion of this test differs from the conclusion for paired test.
The reason is that in general Var(List-Sale) = Var(List) + Var(Sale) - 2*Cov(List, Sale).
This test assumes independence so that 2*Cov(List, Sale) is (erroneously) set to 0.
Then Var(List-Sale) = Var(List) + Var(Sale) = 1669.5 + 1398.1 = 3067.6.
The paired test uses the correct sample Var(List-Sale) = 74.48, which is much smaller.]

TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: KNOWN VARIANCES

Using Tools | Data Analysis | z-test: Two-Sample for Means
allows one to test for a difference other than 0 (but here do put in 0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.700385.

So we do not reject H0 at 5 percent significance level as p-value = 0.704987 which is not < .05.
We conclude the series are the same.

For further information on how to use Excel go to
http://www.econ.ucdavis.edu/faculty/cameron