[R] aggregating using 'with' function
AC Del Re
delre at wisc.edu
Sun Feb 21 03:55:30 CET 2010
OK, this is great, Jim. Last question: How about if I want the 1 copy
of each id to be selected randomly versus taking the first one?
Thank you,
AC
> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman at gmail.com> wrote:
>> I am not sure what you mean by eliminating a row. Now if you want only one
>> copy of each 'id', and it is the first one, the you can use 'duplicated':
>>
>>> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 =
>>> mod1),mean))
>>> x
>> id mod1 r
>> 1 1 1 0.980
>> 2 4 1 0.640
>> 3 7 1 0.490
>> 4 10 1 0.180
>> 5 1 2 0.295
>> 6 5 2 0.490
>> 7 8 2 0.330
>> 8 11 2 0.600
>> 9 6 3 -0.040
>> 10 9 3 0.580
>> 11 12 3 0.210
>>> subset(x, !duplicated(id))
>> id mod1 r
>> 1 1 1 0.98
>> 2 4 1 0.64
>> 3 7 1 0.49
>> 4 10 1 0.18
>> 6 5 2 0.49
>> 7 8 2 0.33
>> 8 11 2 0.60
>> 9 6 3 -0.04
>> 10 9 3 0.58
>> 11 12 3 0.21
>>
>>
>> On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre at wisc.edu> wrote:
>>>
>>> Perfect! Thanks Jim.
>>>
>>> Do you know how I could then reduce the data even further?
>>> Specifically, reducing it to 1 id per row? In this dataset, id 1 would
>>> have one row eliminated.
>>> Assume the data is much larger and cannot be deleted by visual
>>> inspection and elimination one row at a time.
>>>
>>>
>>> Thank you,
>>>
>>> AC
>>>
>>> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman at gmail.com> wrote:
>>> > This seems to work fine (notice the missing 'c(...)'; why did you think
>>> > you
>>> > needed it);
>>> >
>>> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 =
>>> >> mod1),mean))
>>> > id mod1 r
>>> > 1 1 1 0.980
>>> > 2 4 1 0.640
>>> > 3 7 1 0.490
>>> > 4 10 1 0.180
>>> > 5 1 2 0.295
>>> > 6 5 2 0.490
>>> > 7 8 2 0.330
>>> > 8 11 2 0.600
>>> > 9 6 3 -0.040
>>> > 10 9 3 0.580
>>> > 11 12 3 0.210
>>> >>
>>> >
>>> >
>>> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre at wisc.edu> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> I am interested in aggregating a data frame based on 2
>>> >> categories--mean effect size (r) for each 'id's' 'mod1'. The
>>> >> 'with' function works well when aggregating on one category (e.g.,
>>> >> based on 'id' below) but doesnt work if I try 2 categories. How can
>>> >> this be accomplished?
>>> >>
>>> >> # sample data
>>> >>
>>> >> id<-c(1,1,1,rep(4:12))
>>> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8)
>>> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21)
>>> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3)))
>>> >> mod2<-c(1,2,15,rep(3,9))
>>> >> datas<-data.frame(id,n,r,mod1,mod2)
>>> >>
>>> >> # one category works perfect:
>>> >>
>>> >> with(datas, aggregate(list(r = r), by = list(id = id),mean))
>>> >>
>>> >> id r
>>> >> 1 1 0.5233333
>>> >> 2 4 0.6400000
>>> >> 3 5 0.4900000
>>> >> 4 6 -0.0400000
>>> >> 5 7 0.4900000
>>> >> 6 8 0.3300000
>>> >> 7 9 0.5800000
>>> >> 8 10 0.1800000
>>> >> 9 11 0.6000000
>>> >> 10 12 0.2100000
>>> >>
>>> >> # trying with 2 categories:
>>> >>
>>> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 =
>>> >> mod1)),mean))
>>> >>
>>> >> Error in FUN(X[[1L]], ...) : arguments must have same length
>>> >>
>>> >> Thank you,
>>> >>
>>> >> AC
>>> >>
>>> >> ______________________________________________
>>> >> R-help at r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>> >
>>> > --
>>> > Jim Holtman
>>> > Cincinnati, OH
>>> > +1 513 646 9390
>>> >
>>> > What is the problem that you are trying to solve?
>>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
More information about the R-help
mailing list