DATA
Consider data on seniority of job position (level 1, level and level
3) and gender (male of female).
This data is cross classified so that we know the number of people
in each category combination, e.g. how many are job level 3 and female.
Suppose the observed data on 1,000 people can be cross classified as
Level 1 | Level 2 | Level 3 | |
Male | 300 | 200 | 100 |
Female | 200 | 150 | 50 |
THEORY
We suppose there are r rows with row index row i
We suppose there are c columns typical column j
Then there are r*c cells with typical cell being cell ij
The data are denoted O_ij for observed frequencies for cell ij.
e.g. Here O_13 = 100.
We need to calculate E_ij, the expected number of frequencies in cell
ij if data are independent.
This is the hard part. See a textbook for this.
The test is:
For each cell ij calculate [(O_ij - E_ij)^2] / E_ij
Sum this over all ij pairs to get the chisquare test statistic, say
chi.
The p-value of this test is the Pr[chi-square with (r-1)*(c-1) degrees
of freedom > chi].
Reject H0: independent against Ha: not independent at level alpha if
p-value < alpha.
APPLICATION
See a textbook for how to construct expected frequencies for independent
data.
For these data the expected frequencies are
Level 1 | Level 2 | Level 3 | |
Male | 300 | 210 | 90 |
Female | 200 | 140 | 60 |
Suppose the observed frequencies are in cells B2:D4
and the expected frequencies are in cell B8:D9.
Then give the command =CHITEST(B2:D4,B8:D9) and the p-value of 0.1375 is returned.
Since the p-value is not < .05 we do not reject the null hypothesis.
Conclude that gender and level of position are statistically independent.
For further information on how to use Excel go to
http://www.econ.ucdavis.edu/faculty/cameron