> >>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes:
>
> PD> "Venables, Bill (CMIS, Cleveland)" <Bill.Venables at cmis.CSIRO.AU>
> PD> writes:
> >> The fact that every elementary book on statistics does it this way
> >> does not make it correct. To be helpful, a histogram really has to
> >> be a non-parametric density estimator, period.
> >>
> >> Enough already of polemics.
>
> PD> Not quite! There is a reason for doing it the other way, namely
> PD> that the concept of a histogram generally comes before the concept
> PD> of a probability density, pedagogically. It is very easy to explain
> PD> that you chop up the axis into bins and count the number of data
> PD> points that fall in each of them. I bet that half of the MDs that I
> PD> teach never quite understand the density (hell, the author of the
> PD> textbook I use managed to plot three identical gaussian curves with
> PD> identical y axis but different x axes... and he's a
> PD> statistician). So for the basic uses of the histogram, one would be
> PD> replacing a perfectly intuitive simple unit with a substantially
> PD> more complex one.
>
> I agree 100% with Peter.
> Being a mathematician I agree with Bill that for us, a histogram is a
> (very suboptimal) density estimate; but average statistics software users
> *do* learn histograms differently..
I hope there are many of us that agree 100% with Bill. Bad practice,
as enshrined in the default behaviour of histogram, should be
discouraged. We should aim to introduce density-based histograms from
the outset, and the default behaviour of histograms in many packages
acts against this principle. The current default behaviour conveys a
misleading and arguably useless summary, and I don't go with the
argument that we should persist with it because it is simple to
understand where the numbers come from.
Cheers,
David.
