[R] dataframe indexing by number of cases per group
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Nov 24 15:12:57 CET 2011
On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger <JRadinger at gmx.at> wrote:
> Hello,
>
> assume we have following dataframe:
>
> group <-c(rep("A",5),rep("B",6),rep("C",4))
> x <- c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
> df <- data.frame(group,x)
>
> Now I want to select all cases (rows) for those groups
> which have more or equal 5 cases (so I want to select
> all cases of group A and B).
> How can I use the indexing for such questions?
>
> df[??]... I think it is probably quite easy but I really
> don't know how to do that at the moment.
>
> maybe someone can help me...
>
Here are three approaches:
subset(merge(df, xtabs(~ group, df)), Freq >= 5)
:
subset(transform(df, len = ave(x, group, FUN = length)), len >= 5)
library(sqldf)
sqldf('select a.*
from df a join (select "group", count(*) "count" from df group by "group")
using ("group")
where "count" >= 5')
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list