[R] nclass.scott() and nclass.FD() {Re: truehist bug?}
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 20 10:17:02 CET 2007
[This has become entirely a topic for 'R-devel' hence, I'm
diverting to there keeping R-help once; please follow-up
only to R-devel ]
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Tue, 20 Mar 2007 08:49:16 +0100 writes:
>>>>> "Gad" == Gad Abraham <g.abraham at ms.unimelb.edu.au>
>>>>> on Tue, 20 Mar 2007 17:02:18 +1100 writes:
Gad> Hi,
Gad> Is this a bug in truehist()?
>>> library(MASS)
>>> x <- rep(1, 10)
>>> truehist(x)
Gad> Error in pretty(data, nbins) : invalid 'n' value
MM> You get something similar though slightly more helpful
MM> from
MM> hist(x, "scott")
MM> which then uses the same method for determining the number of bins /
MM> classes for the histogram.
MM> I'd say the main "bug" is in
MM> nclass.scott() [ and also nclass.FD() ]
MM> which do not work when var(x) == 0 as in this case.
MM> One could argue that
MM> 1) truehist(x) should not use "scott" as
MM> default when var(x) == 0 {hence a buglet in truehist()}
MM> and either
MM> 2) both hist() and truehist() should produce a better error
MM> message when "scott" (or "FD") is used explicitly and var(x) == 0
MM> or, rather IMO,
MM> 3) nclass.scott(x) and nclass.FD(x) should be extended to return a
MM> non-negative integer even when var(x) == 0
after some further thought,
I'm proposing to adopt '3)' {only; '1)' becomes unneeded}
with the following new code which is back-compatible for the
case where 'h > 0' and does something reasonable for the case h == 0 :
nclass.scott <- function(x)
{
h <- 3.5 * sqrt(stats::var(x)) * length(x)^(-1/3)
if(h > 0) ceiling(diff(range(x))/h) else 1L
}
nclass.FD <- function(x)
{
h <- stats::IQR(x)
if(h == 0) h <- stats::mad(x, constant = 2) # c=2: consistent with IQR
if (h > 0) ceiling(diff(range(x))/(2 * h * length(x)^(-1/3))) else 1L
}
Martin
More information about the R-help
mailing list