[R] Normalizing grouped data in a data frame
Duncan Murdoch
murdoch at stats.uwo.ca
Fri Nov 9 16:41:09 CET 2007
On 11/9/2007 10:22 AM, Sandy Small wrote:
> Thank you very much.
> That works nicely.
> The trick I particularly needed was "within"which I didn't know about.
within() is new in 2.6.0; it's a nice addition. There's also
transform() which could be used in this situation, replacing
within(subset, { Norm_LVEF <- LVEF/max(LVEF)
Norm_ES_Time <- ES_Time/max(ES_Time)})
by (what I think is) the equivalent
transform(subset, Norm_LVEF = LVEF/max(LVEF),
Norm_ES_Time = ES_Time/max(ES_Time))
The within() function is somewhat more flexible, because you can execute
any block of code at all, rather than just simple assignments to columns.
If anyone with more experience with these functions (e.g. the author)
notices any errors above, please correct me!
Duncan Murdoch
> Also nice to get a data frame out with "sparseby" instead of just a
> mulit-array with "by"
> Sandy
>
> Duncan Murdoch wrote:
>> Sandy Small wrote:
>>> Hi
>>> I am a newbie to R but have tried a number of ways in R to do this
>>> and can't find a good solution. (I could do it out of R in perl or
>>> awk but would like to know how to do this in R).
>>>
>>> I have a large data frame 49 variables and 7000 observations however
>>> for simplicity I can express it in the following data frame
>>>
>>> Base, Image, LVEF, ES_Time
>>> A, 1, 4.32, 0.89
>>> A, 2, 4.98, 0.67
>>> A, 3, 3.7, 0.5
>>> A, 3. 4.1, 0.8
>>> B, 1, 7.4, 0.7
>>> B, 3, 7.2, 0.8
>>> B, 4, 7.8, 0.6
>>> C, 1, 5.6, 1.1
>>> C, 4, 5.2, 1.3
>>> C, 5, 5.9, 1.2
>>> C, 6, 6.1, 1.2
>>> C, 7. 3.2, 1.1
>>>
>>> For each value of LVEF and ES_Time I would like to normalise the
>>> value to the maximum for that factor grouped by Base or Image number,
>>> adding an extra column to the data frame with the normalised value in
>>> it.
>>>
>>> So for the Base = B group in the data frame (the data frame should
>>> have the same length I'm just showing the B part) I would get a
>>> modified data frame as follows.
>>>
>>> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
>>> ...
>>> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
>>> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
>>> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
>>> ...
>>>
>>> Where the results of the division would replace the division shown here.
>>> I hope this makes sense.
>>> If anyone can help I would be very grateful.
>>>
>> You want to look at the by(), tapply() or sparseby() functions (the
>> latter in the reshape package, the others are in base R).
>>
>> For example, I think this untested code does what you want:
>>
>> newdf <- sparseby(olddf, c("Base", "Image"),
>> function(subset)
>> within(subset,
>> { Norm_LVEF <- LVEF/max(LVEF)
>> Norm_ES_Time <-
>> ES_Time/max(ES_Time)
>> }))
>>
>> where olddf is the old dataframe, and newdf is newly created.
>>
>> Duncan Murdoch
>
>
> **********************************************************************
> This message may contain confidential and privileged information.
> If you are not the intended recipient please accept our apologies.
> Please do not disclose, copy or distribute information in this e-mail
> or take any action in reliance on its contents: to do so is strictly
> prohibited and may be unlawful. Please inform us that this message has
> gone astray before deleting it. Thank you for your co-operation.
>
> NHSmail is used daily by over 100,000 staff in the NHS. Over a million
> messages are sent every day by the system. To find out why more and
> more NHS personnel are switching to this NHS Connecting for Health
> system please visit www.connectingforhealth.nhs.uk/nhsmail
> **********************************************************************
More information about the R-help
mailing list