The tests are valid for large samples from any distribution or small
samples from the normal.
For small samples from nonnormal distributions, one should instead
use the Wilcoxon rank sum test for differences in two medians. This is
not automatically calculated by Excel, and you will neeed to see a textbook
for how to do the Wilcoxon test in Excel.
THEORY
Tests for difference in two means for two related sets of measures on continuous data.
Let y1, y2, .... be one data series with mean mux.
Let x1, x2, .... be a second data series with mean muy.
Then we consider
two-sided test: H0: muy - mux = 0 versus Ha:
muy - mux not equal to 0
one-sided test: H0: muy - mux <= 0 versus
Ha: muy - mux > 0
or:
H0: muy - mux >= 0 versus Ha: muy - mux < 0
There are several different tests:
In analysis of economics data the favored tool is regression, and would
be used here.
For completeness we give all three methods as many other disciplines
use non-regression methods for comparing two means.
We focus on a two-sided test that the means are the same.
INPUT DATA
The data used are house price data. For the same house we have the list
price and the sale price.
This is a paired sample, so that is the only correct method for these
data.
For demonstration purposes we will also use this data for unpaired
samples.
The data are first list price and then sale price for 29 houses.
Sale Price
386.0 375.0
369.9 340.0
315.0 310.0
279.9 279.9
280.0 278.5
273.0 273.0
279.5 272.0
269.0 270.0
263.9 270.0
259.9 258.5
240.0 255.0
269.0 253.0
235.0 249.0
252.0 245.0
249.0 244.0
242.5 241.0
239.5 239.5
249.0 238.0
239.5 236.5
242.0 235.0
245.0 235.0
244.5 233.0
239.0 230.0
229.5 229.0
224.5 224.5
229.0 220.0
219.9 213.0
212.0 212.0
199.9 204.0
Read the labels and data into cells A1:A30.
TEST FOR DIFFERENCE BETWEEN TWO PAIRED MEANS
Using the TTEST command for paired means (type = 1) with two
tails yields
TTEST(A1:A30,C1:C30,2,1) = 0.021175
[With one tail the p-value is 0.010587]
So we reject H0 at 5 percent significance level as p-value = 0.021175
is < .05.
Conclude that means are unequal.
Using Tools | Data Analysis | t-test: Paired Two Sample for Mean
allows one to test for a difference other than 0 (but here do put in
0)
and yields more complete output:
List Price Sale Price
Mean
257.8 253.9
Variance
1669.5 1398.0
Observations
29
29
Pearson Correlation
0.979
Hypothesized Mean Difference 0
df
28
t Stat
2.44206
P(T<=t) one-tail
0.010587582
t Critical one-tail 1.701130259
P(T<=t) two-tail
0.021175163
t Critical two-tail 2.048409442
Using Tools | Data Analysis | Descriptive Statistics
Create in cells C1:C30 a header and the series List - Sale (e.g. C2=A2-B2).
Then Tools | Descriptive Statistics yields
Mean = 3.913793 and Standard error 1.602659
This yields p-value = TDIST(3.913793/1.602659,28,2) = 0.021175.
Using Tools | Data Analysis | Regression is also possible.
Create a column of ones.
Regress List - Sale on this column of ones selecting the No Constant
option.
This yields p-value for the regressor of 0.021175163.
TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: EQUAL VARIANCES
Using the TTEST command for two-sample equal variance (type =
2) with two tails yields
TTEST(A1:A30,C1:C30,2,2) = 0.704987
[With one tail the p-value is 0.352493]
So we do not reject H0 at 5 percent significance level as p-value =
0.704987 which is not < .05.
We conclude the series are the same.
Using Tools | Data Analysis | t-test: Two-Sample assuming Equal Variance
allows one to test for a difference other than 0 (but here do put in
0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.704987
Using Tools | Data Analysis | Regression:
Create in cells D1:C59 a header for Price and a combined series List
and Sale of 58 observations.
Create in cells E1:E59 a header for DSale and a series = 1 if price
is sale price and = 0 if price is list price
Regress D1:D59 on a constant and E1:E59 and do a t-test of whether
the coefficient on DSale is zero.
This yields p-value for the regressor of 0.704987.
[ASIDE: This conclusion of this test differs from the conclusion for
paired test.
The reason is that in general Var(List-Sale) = Var(List) + Var(Sale)
- 2*Cov(List, Sale).
This test assumes independence so that 2*Cov(List, Sale) is (erroneously)
set to 0.
Then Var(List-Sale) = Var(List) + Var(Sale) = 1533.8 + 1533.8 = 3067.6
where 1533.8 is the pooled variance estimate.
The paired test uses the correct sample Var(List-Sale) = 74.48, which
is much smaller.]
TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: UNEQUAL VARIANCES
Using the TTEST command for two-sample unequal variance (type
= 3) with two tails yields
TTEST(A1:A30,C1:C30,2,3) = 0.704998
[With one tail the p-value is 0.352499]
So we do not reject H0 at 5 percent significance level as p-value =
0.704987 which is not < .05.
We conclude the series are the same.
Using Tools | Data Analysis | t-test: Two-Sample assuming Unequal
Variance
allows one to test for a difference other than 0 (but here do put in
0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.704987
This seems to be an error! At least in my version of Excel. It is exactly
the same as for the equal variance case.
Error appears to be in construction of the pooled variance.
Using Tools | Data Analysis | Regression is not straightforward.
We would use an F-test to compare running one regression for all 58
observations with just ones as a regressor
with two separate regressions for the 29 observations each with one
regressor.
The test statistics and p-value is very similar to that for equal variance as the two variances are similar.
[ASIDE: This conclusion of this test differs from the conclusion for
paired test.
The reason is that in general Var(List-Sale) = Var(List) + Var(Sale)
- 2*Cov(List, Sale).
This test assumes independence so that 2*Cov(List, Sale) is (erroneously)
set to 0.
Then Var(List-Sale) = Var(List) + Var(Sale) = 1669.5 + 1398.1
= 3067.6.
The paired test uses the correct sample Var(List-Sale) = 74.48, which
is much smaller.]
TEST FOR DIFFERENCE BETWEEN UNPAIRED MEANS: KNOWN VARIANCES
Using Tools | Data Analysis | z-test: Two-Sample for Means
allows one to test for a difference other than 0 (but here do put in
0)
and output of similar format to that given above for Paired Means.
The output includes P(T<=t) two-tail 0.700385.
So we do not reject H0 at 5 percent significance level as p-value =
0.704987 which is not < .05.
We conclude the series are the same.
For further information on how to use Excel go to
http://www.econ.ucdavis.edu/faculty/cameron