[R] Best way to compute the difference between two levels of a factor ?

wphantomfr wphantomfr at gmail.com
Wed Mar 21 12:19:13 CET 2012


> Okay, try this:
>
>  result <- with(data,
>                 aggregate(data[,-(1:2)], by=list(ID), FUN=diff)) 

That's it !! I didn't knew the "diff" function. Your solution works 
perfectly.


Thanks Peter for this !

Sylvain


Le 21/03/12 12:01, Peter Ehlers a écrit :
> On 2012-03-21 03:37, wphantomfr wrote:
>> Thanks peter for your fast answer.
>>
>>
>> your is really nice but if I have say 20 variables I have to write 20
>> statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]".
>>
>> Does someone has a trick to avoid this ? It may not be easily possible.
>
> Okay, try this:
>
>  result <- with(data,
>                 aggregate(data[,-(1:2)], by=list(ID), FUN=diff))
>
> This assumes that the dataframe is sorted as in your example. If
> that's not the case, then use order to arrange it first:
>
>  data <- with(data, data[order(ID, TIME), ])
>
>
> Peter Ehlers
>
>>
>> Le 21/03/12 11:03, Peter Ehlers a écrit :
>>> On 2012-03-21 01:48, wphantomfr wrote:
>>>> Dear R-help Members,
>>>>
>>>>
>>>> I am wondering if anyone think of the optimal way of computing for
>>>> several numeric variable the difference between 2 levels of a factor.
>>>>
>>>>
>>>> To be clear let's generate a simple data frame with 2 numeric 
>>>> variables
>>>> collected for different subjects (ID) and 2 levels of a TIME factor
>>>> (time of evaluation)
>>>>
>>>> data=data.frame(ID=c("AA","AA","BB","BB","CC","CC"),TIME=c("T1","T2","T1","T2","T1","T2"),X=rnorm(6,10,2.3),Y=rnorm(6,12,1.9)) 
>>>>
>>>>
>>>>
>>>>      ID TIME         X         Y
>>>> 1 AA   T1  9.959540 11.140529
>>>> 2 AA   T2 12.949522  9.896559
>>>> 3 BB   T1  9.039486 13.469104
>>>> 4 BB   T2 10.056392 14.632169
>>>> 5 CC   T1  8.706590 14.939197
>>>> 6 CC   T2 10.799296 10.747609
>>>>
>>>> I want to compute for each subject and each variable (X, Y, ...) the
>>>> difference between T2 and T1.
>>>>
>>>> Until today I do it by reshaping my dataframe to the wide format (the
>>>> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then  compute the
>>>> difference between successive  columns one by one :
>>>> data$Xdiff=data$X.T2-data$X.T1
>>>> data$Ydiff=data$Y.T2-data$Y.T1
>>>> ...
>>>>
>>>> but this way is probably not optimal if the difference has to be
>>>> computed for a large number of variables.
>>>>
>>>> How will you handle it ?
>>>
>>> One way is to use the plyr package:
>>>
>>>   library(plyr)
>>>   result<- ddply(data, "ID", summarize,
>>>               DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
>>>               DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])
>>>
>>> Peter Ehlers
>>>
>



More information about the R-help mailing list