[R] non-linear binning? power-law in R
Roger D. Peng
rpeng at jhsph.edu
Wed Jun 16 16:58:21 CEST 2004
When you specify a single number to the `br' argument of `hist' it is
only a suggestion. See the help page, ?hist. If you want to use the
same breakpoints everytime, you need to specify the locations explicity.
-roger
Dan Bolser wrote:
> On Wed, 16 Jun 2004, Roger D. Peng wrote:
>
>
>>You can use hist(x, br, plot = FALSE)$counts.
>>
>>-roger
>
>
> OK, I made this...
>
>
> y4 <- runif(100000)**4
> y4log <- -log(y4)
> y4bin20 <- hist(y4log,20,plot=FALSE)$counts
> y4bin20log <- log(y4bin20)
>
> plot(y4bin20log)
>
> for(i in 2:20){
> y4 <- runif(100000)**4
> y4log <- -log(y4)
> y4bin20 <- hist(y4log,20,plot=FALSE)$counts
> y4bin20log <- log(y4bin20)
> points(y4bin20log,col=i)
> }
>
>
> I am seeing something very strange about 1 loop in every five or ten of
> the above...
>
> Can someone try this and let me know if every fifth run or so they see a
> totally different trend? Basically looks like only 10 bins are being used
> at random during the run!?
>
> Hopefully you will also see what I am trying to do, which is to highlight
> the increased variance in the tail...
>
> How would I estimate the slope of y4bin20log? (sorry for the names).
>
> Thanks for the help,
> Dan.
>
>
>>Dan Bolser wrote:
>>
>>>First, thanks to everyone who helped me get to grips with R in (x)emacs
>>>(I get confused easily). Special thanks to Stephen Eglen for continued
>>>support.
>>>
>>>My question is about non-linear binning, or density functions over
>>>distributions governed by a power law ...
>>>
>>>y ~ mu*x**lambda # In one of its forms
>>> # (can't find Pareto in the online help)
>>>
>>>Looking at the following should show my problem....
>>>
>>>x3 <- runif(10000)**3 # Probably a better (correct) way to do this
>>>
>>>plot( density(x3,cut=0,bw=0.1))
>>>plot( density(x3,cut=0,bw=0.01))
>>>plot( density(x3,cut=0,bw=0.001))
>>>
>>>plot(density(x3,cut=0,bw=0.1), log='xy')
>>>plot(density(x3,cut=0,bw=0.01), log='xy')
>>>plot(density(x3,cut=0,bw=0.001),log='xy')
>>>
>>>The upper three plots show that the bw has a big effect on the appearance
>>>of the graph by rescaling based on the initial density at low values of x,
>>>which is very high.
>>>
>>>The lower plots show (I think) an error in the use of linear bins to view
>>>a non linear trend. I would expect this curve to be linear on log-log
>>>scales (from experience), and you can see the expected behavior in the
>>>tails of these plots.
>>>
>>>If you play with drawing these curves on top of each other they look OK
>>>apart from at the beginning. However, changing the band width to 0.0001 has
>>>a radical effect on these plots, and they begin to show a different trend
>>>(look like they are being governed by a different power).
>>>
>>>Hmmm....
>>>
>>>x3log <- -log(x3)
>>>
>>>plot( density(x3log,cut=0,bw=0.5), log='y',col=1)
>>>
>>>lines(density(x3log,cut=0,bw=0.2), log='y',col=2)
>>>lines(density(x3log,cut=0,bw=0.1), log='y',col=3)
>>>lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
>>>
>>>Sorry...
>>>
>>>
>>>'Real' data of this form is usually discrete, with the value of 1 being
>>>the most frequent (minimum) event, and higher values occurring less
>>>frequently according to a power (power-law). This data can be easily
>>>grouped into discrete bins, and frequency plotted on log scales. The
>>>continuous data generated above requires some form of density estimation
>>>or rescaling into discreet values (make the smallest value equal to 1 and
>>>round everything else into an integer).
>>>
>>>I see the aggregate function, but which function lets me simply count the
>>>number of values in a class (integer bin)?
>>>
>>>The analysis of even the discretized data is made more accurate by the use
>>>of exponentially growing bins. This way you don't need to plot the data on
>>>log scales, and the increasing variance associated with lower probability
>>>events is handled by the increasing bin size (giving good accuracy of
>>>power fitting). How can I easily (ignorantly) implement exponentially
>>>increasing bin sizes?
>>>
>>>Thanks for any feedback,
>>>
>>>Dan.
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>>
>>
>>
>
>
--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng
More information about the R-help
mailing list