[R] Function on columns of a dataframe
David Winsemius
dwinsemius at comcast.net
Fri Jul 9 16:47:35 CEST 2010
On Jul 9, 2010, at 10:26 AM, Eik Vettorazzi wrote:
> you are right. But maybe "aggregate" is close to the desired result?
>
> aggregate(bla, list(bla$cat), max)
Right. I couldn't get it to work until I removed the first two columns:
aggregate(bla[,-(1:2)], list(bla$cat), max)
Then I got pretty much the same dataframe as I would have with :
as.data.frame(lapply( bla[, -(1:2)], function(x) tapply(x, bla$cat,
max) ))
v1 v2 v3 v4
cat1 0.4634519 0.5274645 0.6051479 0.7586322
cat2 0.4062700 0.4282639 0.4443707 0.8419526
cat3 0.4816403 0.4996033 0.3538144 0.9456385
cat4 0.6354560 0.3558259 0.3646292 0.1907295
cat5 0.6663811 0.2154201 0.5059900 0.7573575
cat6 0.5260832 0.3934063 0.3545962 0.6412563
Except that aggregate version returns it with a "Group.1" column of
"cat"s while the other version returned it with the "cat" names in the
rownames. A matter of taste?
--
David.
>
> Am 09.07.2010 16:01, schrieb David Winsemius:
>>
>> On Jul 9, 2010, at 9:46 AM, Eik Vettorazzi wrote:
>>
>>> Hi Nils,
>>> have a look at
>>> ?tapply
>>> hth.
>>
>> Perhaps this will be part way there (I couldn't really figure out the
>> desired structure of the final object):
>>> lapply( bla[, -(1:2)], function(x) tapply(x, bla$cat, max) )
>> $v1
>> cat1 cat2 cat3 cat4 cat5 cat6
>> 0.4634519 0.4062700 0.4816403 0.6354560 0.6663811 0.5260832
>>
>> $v2
>> cat1 cat2 cat3 cat4 cat5 cat6
>> 0.5274645 0.4282639 0.4996033 0.3558259 0.2154201 0.3934063
>>
>> $v3
>> cat1 cat2 cat3 cat4 cat5 cat6
>> 0.6051479 0.4443707 0.3538144 0.3646292 0.5059900 0.3545962
>>
>> $v4
>> cat1 cat2 cat3 cat4 cat5 cat6
>> 0.7586322 0.8419526 0.9456385 0.1907295 0.7573575 0.6412563
>>
>>
>>>
>>> Am 09.07.2010 15:37, schrieb LogLord:
>>>> Hi,
>>>>
>>>> I would like to assign the largest value of a column to a specific
>>>> category
>>>> and repeat this for each column (v1 - v4).
>>>>
>>>>
>>>>> x=c(1:12)
>>>>> cat
>>>>> =
>>>>> c
>>>>> ("cat1
>>>>> ","cat5
>>>>> ","cat2
>>>>> ","cat2","cat1","cat5","cat3","cat4","cat5","cat2","cat3","cat6")
>>>>>
>>>>> v1=rnorm(12,0.5,0.1)
>>>>> v2=rnorm(12,0.3,0.2)
>>>>> v3=rnorm(12,0.4,0.1)
>>>>> v4=rnorm(12,0.6,0.3)
>>>>> bla=data.frame(x,cat,v1,v2,v3,v4)
>>>>> bla
>>>>>
>>>> x cat v1 v2 v3 v4
>>>> 1 1 cat1 0.4013144 0.54839317 0.3946393 0.8679266
>>>> 2 2 cat5 0.4595873 0.45788906 0.4030078 0.5919596
>>>> 3 3 cat2 0.4542865 0.21516928 0.2777649 0.6112099
>>>> 4 4 cat2 0.4787950 0.06252512 0.5095611 0.6450795
>>>> 5 5 cat1 0.4910746 0.56591049 0.5151813 0.8465181
>>>> 6 6 cat5 0.4194397 0.16592579 0.4361643 0.6415192
>>>> 7 7 cat3 0.6148564 0.32240342 0.2690108 0.7114133
>>>> 8 8 cat4 0.6174652 0.28076152 0.4577064 -0.2567284
>>>> 9 9 cat5 0.4775395 0.28611768 0.4660210 0.4634120
>>>> 10 10 cat2 0.4802962 0.03715569 0.4506361 1.0063235
>>>> 11 11 cat3 0.6495094 0.33303172 0.3352933 1.4390324
>>>> 12 12 cat6 0.4891481 0.45355589 0.3880739 0.7831656
>>>>
>>>>>
>>>> I can assign this by the sqldf() command for each column but I
>>>> would
>>>> like to
>>>> automate this as I have many columns.
>>>>
>>>>
>>>>> select=sqldf("select cat, max(v1) FROM bla GROUP BY cat")
>>>>> select
>>>>>
>>>> cat max(v1)
>>>> 1 cat1 0.4910746
>>>> 2 cat2 0.4802962
>>>> 3 cat3 0.6495094
>>>> 4 cat4 0.6174652
>>>> 5 cat5 0.4775395
>>>> 6 cat6 0.4891481
>>>>
>>>>>
>>>> Finally, I would like to have a dataframe where where the cat is
>>>> followed by
>>>> each column maximum.
>>>>
>>>> Thanks for your help!
>>>>
>>>
>>> --
>>> Eik Vettorazzi
>>> Institut für Medizinische Biometrie und Epidemiologie
>>> Universitätsklinikum Hamburg-Eppendorf
>>>
>>> Martinistr. 52
>>> 20246 Hamburg
>>>
>>> T ++49/40/7410-58243
>>> F ++49/40/7410-57790
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
> --
> Eik Vettorazzi
> Institut für Medizinische Biometrie und Epidemiologie
> Universitätsklinikum Hamburg-Eppendorf
>
> Martinistr. 52
> 20246 Hamburg
>
> T ++49/40/7410-58243
> F ++49/40/7410-57790
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list