EXCEL 97: Histogram

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This September 1999 help sheet gives information on how to


DATA

We read in the following data on house sale prices in Central Davis in 1999, measured in thousands of dollars.
375, 340, 310, 279.9, 278.5, 273, 272, 270, 270, 258.5, 255, 253, 249, 245, 244, 241, 239.5, 238, 236.5, 235, 235, 233, 230, 229, 224.5, 220, 213, 212, 204.
[One way to get this data into column format is to cut the data into a text editor and manually type in enter after each observation and delete the comma].

HISTOGRAM

There are 25 observations. Interpretation of the data is made somewhat easier by the data already being ordered (if it is not ordered we can order it using Data | Sort). But it is useful to collapse the data further.

A histogram does this by counting the number of observations that fall within a certain range (a "bin") and then plotting this frequency against the bin value.

In Excel choose Tools | Data Analysis | histogram and hit the histogram key.

This gives the first histogram given in the output below. For this data Excel has chosen six intervals with boundaries 204.0, 238.2, 272.4, 306.6, 340.8.
The general rule Excel uses is equal-width intervals with the number of intervals approximately equal to the square root of the number of data points.

For these data the number of intervals seems reasonable but the cell boundaries are not nice rounded numbers. For this particular data rounding off to the nearest $50,000 seems natural, so we might choose the interval boundaries as 200, 250, 300, 350 and 400. This choice can be done by creating a bin. In, say, columns B1:B6 type, respectively Sales Price, 200, 250, 300, 350, 400. (The entry in B1 will be used to automatically label the x-axis of the histogram).

In Excel choose Tools | Data Analysis | histogram and hit the histogram key.

Now the interval boundaries are nice rounded numbers. Further improvements to the histogram are: Further details on improving the presentation of graphics is given in Excel: Charts (bar, column, pie, line).

You can move the chart within the worksheet by clicking on it, leaving the mouse depressed, and drag the mouse.
If you need to enlarge the chart size, then resize the entire chart by putting mouse icon on bottom right hand corner of chart and dragging down to appropriate size.
 

VARIATIONS ON A HISTOGRAM

Variations on a histogram are

OUTPUT

Sample output is given below. I stored the data and labels in A3:A32 as I used the first two rows for data description and I stored the bin in B24:B29.

A final point is interpretation. Excel labels the histogram by using the upper value in the interval. Thus the frequency of 17 labeled under 250 means that there were 17 observations in the sample that took values greater than 200 and less than or equal to 250. It would be clearer to label this interval using its midpoint of 225, and this is done by some other programs.

For further information on how to use Excel
   go to http://www.econ.ucdavis.edu/faculty/cameron