[R] Normalizing grouped data in a data frame
Duncan Murdoch
murdoch at stats.uwo.ca
Fri Nov 9 12:35:26 CET 2007
Sandy Small wrote:
> Hi
> I am a newbie to R but have tried a number of ways in R to do this and
> can't find a good solution. (I could do it out of R in perl or awk but
> would like to know how to do this in R).
>
> I have a large data frame 49 variables and 7000 observations however for
> simplicity I can express it in the following data frame
>
> Base, Image, LVEF, ES_Time
> A, 1, 4.32, 0.89
> A, 2, 4.98, 0.67
> A, 3, 3.7, 0.5
> A, 3. 4.1, 0.8
> B, 1, 7.4, 0.7
> B, 3, 7.2, 0.8
> B, 4, 7.8, 0.6
> C, 1, 5.6, 1.1
> C, 4, 5.2, 1.3
> C, 5, 5.9, 1.2
> C, 6, 6.1, 1.2
> C, 7. 3.2, 1.1
>
> For each value of LVEF and ES_Time I would like to normalise the value
> to the maximum for that factor grouped by Base or Image number, adding
> an extra column to the data frame with the normalised value in it.
>
> So for the Base = B group in the data frame (the data frame should have
> the same length I'm just showing the B part) I would get a modified data
> frame as follows.
>
> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
> ...
> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
> ...
>
> Where the results of the division would replace the division shown here.
> I hope this makes sense.
> If anyone can help I would be very grateful.
>
You want to look at the by(), tapply() or sparseby() functions (the
latter in the reshape package, the others are in base R).
For example, I think this untested code does what you want:
newdf <- sparseby(olddf, c("Base", "Image"),
function(subset)
within(subset,
{ Norm_LVEF <- LVEF/max(LVEF)
Norm_ES_Time <-
ES_Time/max(ES_Time)
}))
where olddf is the old dataframe, and newdf is newly created.
Duncan Murdoch
More information about the R-help
mailing list