[R] Unexpected behavior from hist()

David Carlson dcarlson at tamu.edu
Thu Jun 13 17:56:05 CEST 2013


Density means that the AREAS of the bars add to 1, not the HEIGHTS
of the bars. You probably have intervals that are less than 1. Eg:

> set.seed(42)
> x <- rpois(1000, 5)/100
> info <- hist(x, prob=TRUE)
> info
$breaks
 [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
0.12 0.13

$counts
 [1]  42  88 151 177 178 131  97  70  43  14   6   2   1

$density
 [1]  4.2  8.8 15.1 17.7 17.8 13.1  9.7  7.0  4.3  1.4  0.6  0.2
0.1

$mids
 [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095
0.105 0.115
[13] 0.125

$xname
[1] "x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
> diff(info$breaks)*info$density # Areas of each bar
 [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014
0.006 0.002
[13] 0.001
> sum(diff(info$breaks)*info$density) # Sum of the areas
[1] 1

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee
Sent: Thursday, June 13, 2013 10:36 AM
To: Mohamed Badawy
Cc: r-help at r-project.org
Subject: Re: [R] Unexpected behavior from hist()

Hi,

On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy
<mbadawy at pm-engr.com> wrote:
> Hi... I'm still a beginner in R. While doing some curve-fitting
with a raw data set of length 22,000, here is what I had:
>
>
>
>> hist(y,col="red")
>
> gives me the frequency histogram, 13 total rectangles, highest is
near 5000.
>

You don't provide a reproducible example, so here's some fake data:

somedata <- runif(1000)


> Now
>
>> hist(y,prob=TRUE,col="red",ylim=c(0,1.5))
>
> gives me the density (probability?) histogram, same number f
rectangles, but the highest rectangle is obviously higher than 1,
how can this be?!!!

Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:

freq: logical; if 'TRUE', the histogram graphic is a representation
          of frequencies, the 'counts' component of the result; if
          'FALSE', probability densities, component 'density', are
          plotted (so that the histogram has a total area of one).
          Defaults to 'TRUE' _if and only if_ 'breaks' are
equidistant
          (and 'probability' is not specified).


It sounds like what you actually want is:

somehist <- hist(somedata, plot=FALSE)
somehist$counts <- somehist$counts/sum(somehist$counts)
plot(somehist)

> P.S. I had to post this thread via email as it got rejected as I
posted it from Nabble, reason was "Message rejected by filter rule
match"

Nabble is not the R-help mailing list. Posting via email is the
correct thing to do.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list