[R] Data frame reordering to time series

Gabor Grothendieck ggrothendieck at gmail.com
Sun Aug 8 05:45:07 CEST 2010


On Sat, Aug 7, 2010 at 9:18 PM, steven mosher <moshersteven at gmail.com> wrote:
> Very Slick.
> Gabor this is a Huge speed up for me. Thanks. ha, Now I want to rewrite a
> bunch of working code.
>
>
>
> Id<-c(rep(67543,4),rep(12345,3),rep(89765,5))
>  Years<-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
> Values2<-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
>  Values<-c(12,14,34,21,54,65,23,12,13,13,13,14)
>  Data<-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values)
>  Data
>    Index Year Jan  Feb Mar Apr Jun
> 1  67543 1989  12  6.0  12  12  12
> 2  67543 1990  14  7.0  NA  NA  14
> 3  67543 1991  34 17.0  34  34  34
> 4  67543 1992  21 10.5  21  21  21
> 5  12345 1991  54 27.0  NA  NA  54
> 6  12345 1993  65 32.5  65  65  65
> 7  12345 1994  23 11.5  23  23  23
> 8  89765 1991  12  6.0  NA  NA  12
> 9  89765 1992  13  6.5  13  13  13
> 10 89765 1993  13  6.5  NA  NA  13
> 11 89765 1994  13  6.5  13  13  13
> 12 89765 1995  14  7.0  14  14  14
> #  Gabor's solution
>  f <- function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
>  do.call(cbind, by(Data, Data$Index, f))
>              12345 67543 89765


The original data had consecutive months in each series (actually
there was a missing 1992 in one case but I assumed that was an
inadvertent omission and the actual data was complete); however, here
we have missing 6 month chunks in addition.  That makes the series
non-consecutive so to solve that we could either apply this to the
data (after putting the missing 1992 year back in):

Data <- cbind(Data, NA, NA, NA, NA, NA, NA)

or we could use a time series class that can handle irregularly spaced data:

library(zoo)
f <- function(x) {
	dat <- x[-(1:2)]
	tim <- as.yearmon(outer(x$Year, seq(0, length = ncol(dat))/12, "+"))
	zoo(c(as.matrix(dat)), tim)
}
do.call(cbind, by(Data, Data$Index, f))

The last line is  unchanged from before.  This code will also handle
the original situation correctly even if the missing 1992 is truly
missing.



More information about the R-help mailing list