[R] dataframe indexing by number of cases per group
Johannes Radinger
JRadinger at gmx.at
Thu Nov 24 16:01:54 CET 2011
Hi,
thank you for your suggestions.
I think I'll stay with Dennis' approach
as this is a real indexing approach:
df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length) > 4, ]
I'll try that out now....
best regards
/Johannes
-------- Original-Nachricht --------
> Datum: Thu, 24 Nov 2011 09:12:57 -0500
> Von: Gabor Grothendieck <ggrothendieck at gmail.com>
> An: Johannes Radinger <JRadinger at gmx.at>
> CC: r-help at r-project.org
> Betreff: Re: [R] dataframe indexing by number of cases per group
> On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger <JRadinger at gmx.at>
> wrote:
> > Hello,
> >
> > assume we have following dataframe:
> >
> > group <-c(rep("A",5),rep("B",6),rep("C",4))
> > x <- c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
> > df <- data.frame(group,x)
> >
> > Now I want to select all cases (rows) for those groups
> > which have more or equal 5 cases (so I want to select
> > all cases of group A and B).
> > How can I use the indexing for such questions?
> >
> > df[??]... I think it is probably quite easy but I really
> > don't know how to do that at the moment.
> >
> > maybe someone can help me...
> >
>
> Here are three approaches:
>
> subset(merge(df, xtabs(~ group, df)), Freq >= 5)
> :
> subset(transform(df, len = ave(x, group, FUN = length)), len >= 5)
>
> library(sqldf)
> sqldf('select a.*
> from df a join (select "group", count(*) "count" from df group by
> "group")
> using ("group")
> where "count" >= 5')
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
--
More information about the R-help
mailing list