[R] cut.POSIXt misconception/feature/bug?

Petr PIKAL petr.pikal at precheza.cz
Thu Mar 11 08:39:58 CET 2010


Thanks

You are second who responded. In my previous mail I suggested slight 
modification for cut.POSIXt help page to help those who do not use this 
too often to avoid this trap.

Regards
Petr


Brian Diggs <diggsb at ohsu.edu> napsal dne 10.03.2010 20:47:49:

> On 3/10/2010 1:01 AM, Petr PIKAL wrote:
> > Dear all
> > recently I tried to split vector of dates according to some particular 

> > date to 2 (more) chunks, but I was not able to perform correct 
setting.
> > 
> > When I want split to 3 chunks it partially works however from help 
page I 
> > supposed to get result without NA.
> > 
> > Details:
> > 
> >      Using both ‘right = TRUE’ and ‘include.lowest = TRUE’ will
> >      include both ends of the range of dates.
> > 
> > dat <- seq(c(ISOdate(2000,3,20)), by = "day", length.out = 60)
> > br<-dat[c(23, 42)]
> > head(cut(dat, breaks=br, right=T, include.lowest=T))
> > 
> > [1] <NA> <NA> <NA> <NA> <NA> <NA>
> > Levels: 2000-04-11 14:00:00
> > 
> > which apparently is not output I would like to have.
> 
> The breaks argument does not work the way you think it does.  To get n 
groups,
> you need n+1 breaks.  That is, an data outside the range of your 
breakpoints 
> will be set to NA.  To make sure all the data is included, your breaks 
must 
> include the extreme values of what you are cutting.
> 
> br <- dat[c(1,23,42,60)]
> cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
> # [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-04-11 05:00:00
> #[25] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[28] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[31] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[34] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[37] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[40] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #Levels: 2000-03-20 04:00:00 2000-04-11 05:00:00 2000-04-30 05:00:00
> 
> > When trying to split to 2 chunks there is a strange error
> > 
> > br<-dat[42]
> > cut(dat, breaks=br, right=T, include.lowest=T)
> > Error in cut.default(unclass(x), unclass(breaks), labels = labels, 
right = 
> > right,  :  cannot allocate vector of length 955454401
> 
> To get 2 chunks, you need 3 breaks
> 
> br <- dat[c(1,42,60)]
> cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
> # [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[25] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[28] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[31] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[34] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[37] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[40] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #Levels: 2000-03-20 04:00:00 2000-04-30 05:00:00
> 
> > I traced it back to 
> > 
> > Browse[5]> nb
> > [1] 955454401
> > ^^^^^^^^^^^^^^^^^^^^^^
> > Browse[5]> 
> > debug: NULL
> > Browse[5]> 
> > debug: breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000, 
length.out = 
> > nb)
> > Browse[5]> 
> > Error in cut.default(unclass(x), unclass(breaks), labels = labels, 
right = 
> > right,  : 
> >   cannot allocate vector of length 955454401
> > 
> > which is probably not correct.
> 
> If you give breaks a single number, it is interpreted as the "number 
giving 
> the number of intervals which x is to be cut into."  Since you need one 
more 
> break than groups, a break of length 1 is not meaningful, so it was 
overloaded
> to mean the number of groups wanted in the end.  As you saw, nb as an 
integer 
> was 955454401, so cut.POSIXt assumed you wanted 955454401 evenly spaced 
> groups, and that was too large to allocated which gave the error you 
saw.
> 
> > Can somebody help me to the right track?
> > 
> > 
> >> version
> >                _ 
> > platform       i386-pc-mingw32 
> > arch           i386 
> > os             mingw32 
> > system         i386, mingw32 
> > status         Under development (unstable) 
> > major          2 
> > minor          11.0 
> > year           2010 
> > month          03 
> > day            09 
> > svn rev        51229 
> > language       R 
> > version.string R version 2.11.0 Under development (unstable) 
(2010-03-09 
> > r51229)
> > 
> > Regards
> > Petr
> 
> 
> --
> Brian Diggs, Ph.D.
> Senior Research Associate, Department of Surgery, Oregon Health & 
Science University
> 
> 
> 
> 



More information about the R-help mailing list