[R] Re-binning histogram data

Duncan Murdoch murdoch at stats.uwo.ca
Fri Jun 9 14:38:49 CEST 2006


On 6/8/2006 11:51 AM, Berton Gunter wrote:
> I would argue that histograms are outdated relics and that density plots
> (whatever your favorite flavor is) should **always** be used instead these
> days.

But my favourite density plot is a histogram!

I agree that computational complexity should weigh much less in the 
decision to do something than it used to.  But I'd say a histogram (with 
more bins than the R default) is a good input to my mental density 
estimator.   Adding a rug of points below it is helpful in small 
datasets.  It is very easy to see how much smoothing has been done; 
that's often hard to see in presentations of density plots produced in 
other ways.  It's also easier to recognize discrete atoms in the 
distribution:  they'll show up as isolated bars a lot higher than the usual.

For example, compare these two plots:

  set.seed(123)
  par(mfrow=c(2,1))
  x <- c(rnorm(1000), rbinom(100, 3, 0.5))
  hist(x, breaks=60)
  plot(density(x))

This isn't a fair comparison, since I used the default bandwidth on the 
smoother but not on the histogram (it would be fairer to compare to
plot(density(x,bw=0.05)) ), but I think it still illustrates my point: 
in the latter density plot where the atoms are clearly visible, I still 
need to read the text at the bottom to know the sample size and 
bandwidth, whereas I can see those at a glance in the histogram.  And an 
untrained user could get a lot of information out of the histogram, 
whereas they'd have a lot of trouble getting anything out of the density 
plots.

> 
> In this vein, I would appreciate critical rejoinders (public or private) to
> the following proposition: Given modern computer power and software like R
> on multi ghz machines, statistical and graphical relics of the pre-computer
> era (like histograms, low resolution printer-type plots, and perhaps even
> method of moments EMS calculations) should be abandoned in favor of superior
> but perhaps computation-intensive alternatives (like density plots, high
> resolution plots, and likelihood or resampling or Bayes based methods). 
> 
> NB: Please -- no pleadings that new methods would be mystifying to the
> non-cogniscenti. Following that to its logical conclusion would mean that
> we'd all have to give up our TV remotes and cell phones, and what kind of
> world would that be?! :-)

Now, if you were to suggest that the stem() function is a bizarre 
simulation of a stone-age tool on a modern computer, I might agree.

Duncan Murdoch

> 
> -- Bert Gunter
> 
>   
> 
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch 
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Petr Pikal
>> Sent: Thursday, June 08, 2006 6:17 AM
>> To: Justin Ashmall; r-help at stat.math.ethz.ch
>> Subject: Re: [R] Re-binning histogram data
>> 
>> 
>> 
>> On 8 Jun 2006 at 11:35, Justin Ashmall wrote:
>> 
>> Date sent:      	Thu, 8 Jun 2006 11:35:46 +0100 (BST)
>> From:           	Justin Ashmall <ja at space.mit.edu>
>> To:             	Petr Pikal <petr.pikal at precheza.cz>
>> Copies to:      	r-help at stat.math.ethz.ch
>> Subject:        	Re: [R] Re-binning histogram data
>> 
>> > 
>> > Thanks for the reply Petr,
>> > 
>> > It looks to me that truehist() needs a vector of data just like
>> > hist()? Whereas I have histogram-style input data? Am I missing
>> > something?
>> 
>> Well, maybe you could use barplot. Or as you suggested recreate the 
>> original vector and call hist or truehist with other bins.
>> 
>> > hhh<-hist(rnorm(1000))
>> > barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
>> > tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
>>     1     2     3     4     5     6     7 
>> -3.00 -2.00 -1.00  0.00  1.00  2.00  3.25 
>> > hhh1<-rep(hhh$mids,hhh$counts)
>> > plot(hhh, freq=F)
>> > lines(density(hhh1))
>> >
>> 
>> HTH
>> Petr
>> 
>> 
>> 
>> 
>> 
>> 
>> > 
>> > Cheers,
>> > 
>> > Justin
>> > 
>> > 
>> > 
>> > On Thu, 8 Jun 2006, Petr Pikal wrote:
>> > 
>> > > Hi
>> > >
>> > > try truehist from MASS package and look for argument breaks or h.
>> > >
>> > > HTH
>> > > Petr
>> > >
>> > >
>> > >
>> > >
>> > > On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
>> > >
>> > > Date sent:      	Thu, 8 Jun 2006 10:46:19 +0100 (BST)
>> > > From:           	Justin Ashmall <ja at space.mit.edu>
>> > > To:             	r-help at stat.math.ethz.ch
>> > > Subject:        	[R] Re-binning histogram data
>> > >
>> > >> Hi,
>> > >>
>> > >> Short Version:
>> > >> Is there a function to re-bin a histogram to new, broader bins?
>> > >>
>> > >> Long version: I'm trying to create a histogram, however my
>> > >> input-data is itself in the form of a fine-grained 
>> histogram, i.e.
>> > >> numbers of counts in regular one-second bins. I want to produce a
>> > >> histogram of, say, 10-minute bins (though possibly irregular bins
>> > >> also).
>> > >>
>> > >> I suppose I could re-create a data set as expected by the hist()
>> > >> function (i.e. if time t=3600 has 6 counts, add six 
>> entries of 3600
>> > >> to a list) however this seems neither elegant nor 
>> efficient (though
>> > >> I'd be pleased to be mistaken!). I could then re-create 
>> a histogram
>> > >> as normal.
>> > >>
>> > >> I guessing there's a better solution however! Apologies 
>> if this is
>> > >> a basic question - I'm rather new to R and trying to get up to
>> > >> speed.
>> > >>
>> > >> Regards,
>> > >>
>> > >> Justin
>> > >>
>> > >> ______________________________________________
>> > >> R-help at stat.math.ethz.ch mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide!
>> > >> http://www.R-project.org/posting-guide.html
>> > >
>> > > Petr Pikal
>> > > petr.pikal at precheza.cz
>> > >
>> > >
>> > 
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide!
>> > http://www.R-project.org/posting-guide.html
>> 
>> Petr Pikal
>> petr.pikal at precheza.cz
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



More information about the R-help mailing list