[R] group bunch of lines in a data.frame, an additional requirement

Marc Schwartz MSchwartz at mn.rr.com
Fri Sep 15 02:11:38 CEST 2006


Emmanuel,

I wouldn't be surprised if Gabor comes up with something, but since
aggregate() can only return scalars, you can't do it in one step here.
There are possibilities using other functions such as split(), tapply()
or by(), but each has it own respective limitations requiring more than
one step or post consolidation reformatting. 

You could certainly write a unified wrapper function that would do this
in a single call, but unless you plan on doing this sort of operation a
lot, it is probably easier with multiple calls.

I suspect using Gabor's approach as he had below, combined with my own
on using aggregate() (now twice) then using merge() may be the easiest.

HTH,

Marc

On Thu, 2006-09-14 at 21:35 +0100, Emmanuel Levy wrote:
> Thanks Gabor, that is much faster than using a loop!
> 
> I've got a last question:
> 
> Can you think of a fast way of keeping track of the number of
> observations collapsed for each entry?
> 
> i.e. I'd like to end up with:
> 
> A 2.0 400 ID1 3 (3obs in the first matrix)
> B 0.7 35 ID2 2 (2obs in the first matrix)
> C 5.0 70 ID1 1 (1obs in the first matrix)
> 
> Or is it required to use an temporary matrix that is merged later? (As
> examplified by Mark in a previous email?)
> 
> Thanks a lot for your help,
> 
>   Emmanuel
> 
> On 9/13/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > See below.
> >
> > On 9/13/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > Thanks for pointing me out "aggregate", that works fine!
> > >
> > > There is one complication though: I have mixed types (numerical and character),
> > >
> > > So the matrix is of the form:
> > >
> > > A 1.0 200 ID1
> > > A 3.0 800 ID1
> > > A 2.0 200 ID1
> > > B 0.5 20   ID2
> > > B 0.9 50   ID2
> > > C 5.0 70   ID1
> > >
> > > One letter always has the same ID but one ID can be shared by many
> > > letters (like ID1)
> > >
> > > I just want to keep track of the ID, and get a matrix like:
> > >
> > > A 2.0 400 ID1
> > > B 0.7 35 ID2
> > > C 5.0 70 ID1
> > >
> > > Any idea on how to do that without a loop?
> >
> > If V4 is a function of V1 then you can aggregate by it too and it will
> > appear but have no effect on the classification:
> >
> > > aggregate(DF[2:3], DF[c(1,4)], mean)
> >   V1  V4  V2  V3
> > 1  A ID1 2.0 400
> > 2  C ID1 5.0  70
> > 3  B ID2 0.7  35
> >
> >
> > >
> > >  Many thanks,
> > >
> > >     Emmanuel
> > >
> > > On 9/12/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > > Hello,
> > > >
> > > > I'd like to group the lines of a matrix so that:
> > > > A 1.0 200
> > > > A 3.0 800
> > > > A 2.0 200
> > > > B 0.5 20
> > > > B 0.9 50
> > > > C 5.0 70
> > > >
> > > > Would give:
> > > > A 2.0 400
> > > > B 0.7 35
> > > > C 5.0 70
> > > >
> > > > So all lines corresponding to a letter (level), become a single line
> > > > where all the values of each column are averaged.
> > > >
> > > > I've done that with a loop but it doesn't sound right (it is very
> > > > slow). I imagine there is a
> > > > sort of "apply" shortcut but I can't figure it out.
> > > >
> > > > Please note that it is not exactly a matrix I'm using, the function
> > > > "typeof" tells me it's a list, however I access to it like it was a
> > > > matrix.
> > > >
> > > > Could someone help me with the right function to use, a help topic or
> > > > a piece of code?
> > > >
> > > > Thanks,
> > > >
> > > >   Emmanuel
> > > >
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list