[R] hist function: freq=FALSE for standardised histograms

Wed Apr 5 22:16:15 CEST 2006

Hi,
how did you evaluate the total area?
Here is a simple example

###
set.seed(100)
x <- rnorm(100)
x.h <- hist(x, freq=F, plot=F)

> x.h
$breaks
 [1] -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0
 2.5  3.0

$counts
 [1]  3  4  9 14 22 20 13  7  5  2  1

$intensities
 [1] 0.05999999 0.08000000 0.18000000 0.28000000
0.44000000 0.40000000
 [7] 0.26000000 0.14000000 0.10000000 0.04000000
0.02000000

$density
 [1] 0.05999999 0.08000000 0.18000000 0.28000000
0.44000000 0.40000000
 [7] 0.26000000 0.14000000 0.10000000 0.04000000
0.02000000

$mids
 [1] -2.25 -1.75 -1.25 -0.75 -0.25  0.25  0.75  1.25 
1.75  2.25  2.75

$xname
[1] "x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

> sum(diff(x.h$breaks)*x.h$density)
[1] 1

# Also, you can verify

> diff(x.h$breaks)*x.h$density*100
 [1]  2.999999  4.000000  9.000000 14.000000 22.000000
20.000000 13.000000
 [8]  7.000000  5.000000  2.000000  1.000000

HTH
Marco

--- Alex Davies <alex at davz.net> wrote:

> Dear All,
> 
> I am a undergraduate using R for the first time. It
> seems like an excellent
> program and one that I look forward to using a lot
> over the next few years,
> but I have hit a very basic problem that I can't
> solve.
> 
> I want to produce a standardised histogram, i.e. one
> where the area under
> the graph is equal to 1. I look at the manual for
> the histogram function and
> find this:
> 
>     freq: logical; if 'TRUE', the histogram graphic
> is a representation
>           of frequencies, the 'counts' component of
> the result; if
>           'FALSE', probability densities, component
> 'density', are
>           plotted (so that the histogram has a total
> area of one).
>           Defaults to 'TRUE' _iff_ 'breaks' are
> equidistant (and
>           'probability' is not specified).
> 
> I therefore expect that the following command:
> 
> > h <- hist(StockReturns, freq=FALSE)
> 
> where StockReturns has the following data in it:
> 
> > sourcedata$StockReturns
>  [1] -0.006983  0.111565  0.053782  0.027966 
> 0.068956  0.165424 -0.022133
>  [8] -0.001910  0.052174  0.072589 -0.023002 
> 0.000521 -0.015688  0.148459
> [15]  0.054111  0.141044  0.096686 -0.012256
> -0.030397  0.039365  0.021407
> [22] -0.175750  0.053901 -0.095730  0.129717 
> 0.333333  0.061563  0.085052
> [29]  0.072295 -0.008500  0.100000  0.020000
> -0.199763  0.081856  0.013636
> [36]  0.007812  0.038647 -0.026945  0.037965
> -0.079889  0.056234 -0.083333
> [43] -0.012792  0.131711  0.015996  0.008149 
> 0.104568  0.004046 -0.027750
> [50]  0.050802  0.045714  0.092327 -0.017857 
> 0.022574  0.083333  0.051366
> [57]  0.004215  0.083228  0.046803  0.021335 
> 0.023797  0.094891  0.036541
> [64]  0.016423 -0.126365  0.034219  0.098330 
> 0.079292 -0.009901  0.021559
> [71] -0.039414  0.114286  0.101856 -0.010452 
> 0.111111  0.097274  0.104843
> [78]  0.144439  0.021868  0.106667  0.081250 
> 0.002097  0.073302  0.087889
> [85] -0.145165  0.014592  0.035000  0.131711
> -0.126937  0.133989
> 
> would result in a graph that has an area of equal to
> 1.000. However, it does
> not - it produces frequency densities not
> standardized frequency densities.
> Can someone point me in the right direction here - I
> know I am being
> fantastically thick but can't find out how to do
> such a simple operation!
> 
> My complete set of commands looks like this:
> 
> > sourcedata <- read.table("c:/data.dat",header=T)
> > attach(sourcedata)
> > h <- hist(StockReturns, col='red', labels=TRUE,
> ylab="Frequency Density",
> probability=TRUE)
> 
> Where c:\data.dat is a file with the numbers above
> it, one per line, and the
> first line containing the string "StockReturns".
> 
> Many thanks,
> 
> Alex Davies
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>