GENERAL COMMENTS
The most commonly used confidence intervals and hypothesis tests for the population mean use the t-distribution. These are the ones we present, and the ones that the latest versions of Excel produce in descriptive statistics.
The underlying theory to justify this assumes normally distributed data, i.e. that the individual observations are independent and identically distributed observations from a normal distribution with mean mu and unknown variance sigma.
What if the data are not normal, though are still independent and identically distributed observations with mean mu and unknown variance sigma?
EXAMPLE DATA
Start with EXCEL: Descriptive Statistics with
the Confidence level for the mean option chosen and set to 95%.
This yields as output
CONFIDENCE INTERVALS FOR THE POPULATION MEAN
A 95% confidence interval for the population mean is the sample
mean plus or minus the "confidence level" reported by Excel.
Here this yields 119.90 +/- 2.59 = (117.31, 122.49).
This means that the data are consistent 95% of the time with a data generating process with population mean mu in the range 117.31 to 122.49.
To obtain confidence intervals at other levels of confidence, e.g. 90%, in Tools / Data Analysis / Descriptive Statistics change the confidence level for the mean from its default of 95% to e.g. 90%.
Note that since the sample is small here (n=10) this test requires the
assumption that the data are normally distributed (see GENERAL COMMENTS
at the top of this handout). This assumption is impossible to test with
only ten observations, but could be justified if (and I say if) experience
with other larger samples of gasoline price data suggests that this is
a resonable assumption.
DETAILS ON CONFIDENCE INTERVALS FOR THE POPULATION MEAN
A symmetric confidence interval in general is of the form:
parameter of interest +/-
critical value * standard error.
For this particular problem, a 95% confidence interval for the population
mean, for x normal(mu,sigma) this is
mu +/- t(.025;n-1) * s/sqrt(n)
which can be calculated manually in Excel using:
HYPOTHESIS TESTS FOR THE POPULATION MEAN
Hypothesis Tests for the Population Mean
Excel does not have an automatic command and output for hypothesis testing
on the population mean.
Instead one needs to manually perform the hypothesis test using output
from descriptive statistics.
We distinguish between one-sided and two-sided tests.
Let mu0 denote the hypothesized valu of mu.
Two-sided test: H0: mu = mu0 against Ha: mu
not equal to mu0
One-sided test: H0: mu <= mu0 against Ha:
mu > mu0
or: H0: mu >= mu0 against H0: mu < mu0
We first find the value of a test statistic.
We then use either the p-value approach or the critical value approach
to reject or not reject the null hypothesis that mu = mu0.
Test Statistic
A test statistic is often of the form:
(estimated value - hypothesized value) / standard
error
where the standard error gives the precision of the estimate.
For tests of H0: mu = mu0 we use the t-test statistic:
t = (ybar - mu0) / (s/sqrt(n)).
For the example here, to test H0: mu = 118 using the above data,
t = (119.9 - 118) / 1.145038 = 1.659.
The t-statistic is t-distributed with (n-1) degrees of freedom under the null hypothesis that mu = mu0 and assuming that the individual observations are independent and identically distributed observations from a normal distribution with mean mu.
Note that since the sample is small here (n=10) this test requires the assumption that the data are normally distributed (see GENERAL COMMENTS at the top of this handout). This assumption is impossible to test with only ten observations, but could be justified if (and I say if) experience with other larger samples of gasoline price data suggests that this is a resonable assumption.
p-value approach
The p-value is the probability of just rejecting the null hypothesis.
In general we reject the null hypothesis at level alpha if the p-value
< alpha
and do not reject the null hypothesis at level alpha if the p-value
>= alpha.
Let T be a t-distributed random variable and t be the calculated value of the test statistic.
The p-value for hypothesis tests on the population mean is then obtained using the Excel commands:
More generally we will have a different test statistic value than 1.659 and a different number of degrees of freedom than 9.
Critical values
Critical values are used to define a critical region.
If the calculated value of the test statistic falls in critical region then the null hypothesis is rejected. Otherwise it is not rejected.
Let c be the critical value of the test statistic and consider tests at level alpha (often alpha = .05).
The critical for hypothesis tests on the population mean is then obtained using the Excel commands:
The hypothesis tests here use the t-distribution.
The underlying theory assumes normally distributed data, i.e. that the individual observations are independent and identically distributed observations from a normal distribution with mean mu and unknown variance sigma.
What if the data are not normal, though are still independent and identically
distributed observations with mean mu and unknown variance sigma?
If there are more than 30 observations we can continue to use the same
confidence interval. The justification is the central limit theorem.
If there are less than 30 observations we can continue to use the same
confidence interval. The justification is the central limit theorem.
For further information on how to use Excel go to
http://www.econ.ucdavis.edu/faculty/cameron