[R] cut.POSIXt misconception/feature/bug?
Petr PIKAL
petr.pikal at precheza.cz
Thu Mar 11 08:39:58 CET 2010
Thanks
You are second who responded. In my previous mail I suggested slight
modification for cut.POSIXt help page to help those who do not use this
too often to avoid this trap.
Regards
Petr
Brian Diggs <diggsb at ohsu.edu> napsal dne 10.03.2010 20:47:49:
> On 3/10/2010 1:01 AM, Petr PIKAL wrote:
> > Dear all
> > recently I tried to split vector of dates according to some particular
> > date to 2 (more) chunks, but I was not able to perform correct
setting.
> >
> > When I want split to 3 chunks it partially works however from help
page I
> > supposed to get result without NA.
> >
> > Details:
> >
> > Using both ‘right = TRUE’ and ‘include.lowest = TRUE’ will
> > include both ends of the range of dates.
> >
> > dat <- seq(c(ISOdate(2000,3,20)), by = "day", length.out = 60)
> > br<-dat[c(23, 42)]
> > head(cut(dat, breaks=br, right=T, include.lowest=T))
> >
> > [1] <NA> <NA> <NA> <NA> <NA> <NA>
> > Levels: 2000-04-11 14:00:00
> >
> > which apparently is not output I would like to have.
>
> The breaks argument does not work the way you think it does. To get n
groups,
> you need n+1 breaks. That is, an data outside the range of your
breakpoints
> will be set to NA. To make sure all the data is included, your breaks
must
> include the extreme values of what you are cutting.
>
> br <- dat[c(1,23,42,60)]
> cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
> # [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-04-11 05:00:00
> #[25] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[28] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[31] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[34] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[37] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[40] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
> #[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #Levels: 2000-03-20 04:00:00 2000-04-11 05:00:00 2000-04-30 05:00:00
>
> > When trying to split to 2 chunks there is a strange error
> >
> > br<-dat[42]
> > cut(dat, breaks=br, right=T, include.lowest=T)
> > Error in cut.default(unclass(x), unclass(breaks), labels = labels,
right =
> > right, : cannot allocate vector of length 955454401
>
> To get 2 chunks, you need 3 breaks
>
> br <- dat[c(1,42,60)]
> cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
> # [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> # [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[25] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[28] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[31] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[34] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[37] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[40] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
> #[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
> #Levels: 2000-03-20 04:00:00 2000-04-30 05:00:00
>
> > I traced it back to
> >
> > Browse[5]> nb
> > [1] 955454401
> > ^^^^^^^^^^^^^^^^^^^^^^
> > Browse[5]>
> > debug: NULL
> > Browse[5]>
> > debug: breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000,
length.out =
> > nb)
> > Browse[5]>
> > Error in cut.default(unclass(x), unclass(breaks), labels = labels,
right =
> > right, :
> > cannot allocate vector of length 955454401
> >
> > which is probably not correct.
>
> If you give breaks a single number, it is interpreted as the "number
giving
> the number of intervals which x is to be cut into." Since you need one
more
> break than groups, a break of length 1 is not meaningful, so it was
overloaded
> to mean the number of groups wanted in the end. As you saw, nb as an
integer
> was 955454401, so cut.POSIXt assumed you wanted 955454401 evenly spaced
> groups, and that was too large to allocated which gave the error you
saw.
>
> > Can somebody help me to the right track?
> >
> >
> >> version
> > _
> > platform i386-pc-mingw32
> > arch i386
> > os mingw32
> > system i386, mingw32
> > status Under development (unstable)
> > major 2
> > minor 11.0
> > year 2010
> > month 03
> > day 09
> > svn rev 51229
> > language R
> > version.string R version 2.11.0 Under development (unstable)
(2010-03-09
> > r51229)
> >
> > Regards
> > Petr
>
>
> --
> Brian Diggs, Ph.D.
> Senior Research Associate, Department of Surgery, Oregon Health &
Science University
>
>
>
>
More information about the R-help
mailing list