[Fwd: Re: [R] vectorization of a data-aggregation loop]
Christoph Lehmann
christoph.lehmann at gmx.ch
Wed Feb 2 13:49:25 CET 2005
great! many thanks, Phil
Cheers
christoph
Phil Spector wrote:
> Christoph -
> I think reshape is the function you're looking for:
>
>> tt <- data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3),
> + c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c',
> + 'a', 'b', 'a', 'b', 'c')))
>> reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar="id",timevar="type",direction="wide")
>>
> id x.a x.b x.c
> 1 1 6 13 6
> 2 2 10 NA 7
> 3 3 9 14 7
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector at stat.berkeley.edu
>
>
> On Tue, 1 Feb 2005, Christoph Lehmann wrote:
>
>> Hi
>> I have a simple question:
>>
>> the following data.frame
>>
>> id iwv type
>> 1 1 1 a
>> 2 1 2 b
>> 3 1 11 b
>> 4 1 5 a
>> 5 1 6 c
>> 6 2 4 c
>> 7 2 3 c
>> 8 2 10 a
>> 9 3 6 b
>> 10 3 9 a
>> 11 3 8 b
>> 12 3 7 c
>>
>> shall be aggregated into the form:
>>
>> id t.a t.b t.c
>> 1 1 6 13 6
>> 6 2 10 0 7
>> 9 3 9 14 7
>>
>> means for each 'type' (a, b, c) a new column is introduced which
>> gets the sum of iwv for the respective observations 'id'
>>
>> of course I can do this transformation/aggregation in a loop (see
>> below), but is there a way to do this more efficiently, eg. in using
>> tapply (or something similar)- since I have lot many rows?
>>
>> thanks for a hint
>>
>> christoph
>>
>> #------------------------------------------------------------------------------
>>
>> # the loop-way
>> t <- data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3),
>> c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c',
>> 'c', 'a', 'b', 'a', 'b', 'c')))
>> names(t) <- c("id", "iwv", "type")
>> t$iwv <- as.numeric(t$iwv)
>> t
>>
>> # define the additional columns (type.a, type.b, type.c)
>> tt <- rep(0, nrow(t) * length(levels(t$type)))
>> dim(tt) <- c(nrow(t), length(levels(t$type)))
>> tt <- data.frame(tt)
>> dimnames(tt)[[2]] <- paste("t.", levels(t$type), sep = "")
>> t <- cbind(t, tt)
>> t
>>
>> obs <- 0
>> obs.previous <- 0
>> row.elim <- rep(FALSE, nrow(t))
>> ta <- which((names(t) == "t.a")) #number of column which codes the
>> first type
>> r.ctr <- 0
>> for (i in 1:nrow(t)){
>> obs <- t[i,]$id
>> if (obs == obs.previous) {
>> row.elim[i] <- TRUE
>> r.ctr <- r.ctr + 1 #increment
>> type.col <- as.numeric(t[i,]$type)
>> t[i - r.ctr, ta - 1 + type.col] <- t[i - r.ctr, ta - 1 +
>> type.col] + t[i,]$iwv
>> }
>> else {
>> r.ctr <- 0 #record counter
>> type.col <- as.numeric(t[i,]$type)
>> t[i, ta - 1 + type.col] <- t[i,]$iwv
>> }
>> obs.previous <- obs
>> }
>>
>> t <- t[!row.elim,]
>> t <- subset(t, select = -c(iwv, type))
>> t
>>
>> #------------------------------------------------------------------------------
>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
>
More information about the R-help
mailing list