[R] cut.POSIXt misconception/feature/bug?
Brian Diggs
diggsb at ohsu.edu
Wed Mar 10 20:47:49 CET 2010
On 3/10/2010 1:01 AM, Petr PIKAL wrote:
> Dear all
> recently I tried to split vector of dates according to some particular
> date to 2 (more) chunks, but I was not able to perform correct setting.
>
> When I want split to 3 chunks it partially works however from help page I
> supposed to get result without NA.
>
> Details:
>
> Using both ‘right = TRUE’ and ‘include.lowest = TRUE’ will
> include both ends of the range of dates.
>
> dat <- seq(c(ISOdate(2000,3,20)), by = "day", length.out = 60)
> br<-dat[c(23, 42)]
> head(cut(dat, breaks=br, right=T, include.lowest=T))
>
> [1] <NA> <NA> <NA> <NA> <NA> <NA>
> Levels: 2000-04-11 14:00:00
>
> which apparently is not output I would like to have.
The breaks argument does not work the way you think it does. To get n groups, you need n+1 breaks. That is, an data outside the range of your breakpoints will be set to NA. To make sure all the data is included, your breaks must include the extreme values of what you are cutting.
br <- dat[c(1,23,42,60)]
cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
# [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-04-11 05:00:00
#[25] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[28] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[31] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[34] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[37] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[40] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#Levels: 2000-03-20 04:00:00 2000-04-11 05:00:00 2000-04-30 05:00:00
> When trying to split to 2 chunks there is a strange error
>
> br<-dat[42]
> cut(dat, breaks=br, right=T, include.lowest=T)
> Error in cut.default(unclass(x), unclass(breaks), labels = labels, right =
> right, : cannot allocate vector of length 955454401
To get 2 chunks, you need 3 breaks
br <- dat[c(1,42,60)]
cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
# [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[25] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[28] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[31] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[34] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[37] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[40] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#Levels: 2000-03-20 04:00:00 2000-04-30 05:00:00
> I traced it back to
>
> Browse[5]> nb
> [1] 955454401
> ^^^^^^^^^^^^^^^^^^^^^^
> Browse[5]>
> debug: NULL
> Browse[5]>
> debug: breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000, length.out =
> nb)
> Browse[5]>
> Error in cut.default(unclass(x), unclass(breaks), labels = labels, right =
> right, :
> cannot allocate vector of length 955454401
>
> which is probably not correct.
If you give breaks a single number, it is interpreted as the "number giving the number of intervals which x is to be cut into." Since you need one more break than groups, a break of length 1 is not meaningful, so it was overloaded to mean the number of groups wanted in the end. As you saw, nb as an integer was 955454401, so cut.POSIXt assumed you wanted 955454401 evenly spaced groups, and that was too large to allocated which gave the error you saw.
> Can somebody help me to the right track?
>
>
>> version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status Under development (unstable)
> major 2
> minor 11.0
> year 2010
> month 03
> day 09
> svn rev 51229
> language R
> version.string R version 2.11.0 Under development (unstable) (2010-03-09
> r51229)
>
> Regards
> Petr
--
Brian Diggs, Ph.D.
Senior Research Associate, Department of Surgery, Oregon Health & Science University
More information about the R-help
mailing list