[R] Removing & generating data by category
jim holtman
jholtman at gmail.com
Thu Oct 29 03:03:41 CET 2009
Here is one way of doing it:
> a <-
+ data.frame(id=c(c("A1","A2","A3","A4","A5"),
+ c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),
+ clm=c(rep(("General"),6),rep("Life",4)))
> # split the indices based on 'id' & 'loc'
> a.indx <- split(seq(nrow(a)), paste(a$id, a$loc))
> # now take each group and see if 'clm' differs (don't know what you want to
> # do if more than 2 are in the group)
> result <- lapply(a.indx, function(.indx){
+ if (length(.indx) == 1) return(.indx)
+ if (any(a$clm[.indx[1]] != a$clm[.indx])) return(NULL)
+ .indx
+ })
> # output the matches
> a[unlist(result),,drop=FALSE]
id loc clm
1 A1 B1 General
6 A3 B1 General
>
>
On Wed, Oct 28, 2009 at 9:30 PM, Steven Kang <stochastickang at gmail.com> wrote:
> Dear R users,
>
>
> Basically, from the following arbitrary data set:
>
> a <-
> data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4)))
>
>> a
> id loc clm
> 1 A1 B1 General
> 2 A2 B2 General
> 3 A3 B3 General
> 4 A4 B4 General
> 5 A5 B5 General
> 6 A3 B1 General
> 7 A2 B2 Life
> 8 A3 B3 Life
> 9 A4 B4 Life
> 10 A5 B5 Life
>
> I desire removing records (highlighted records above) with identical values
> in each fields ("id" & "loc") but with different value of "clm" (i.e
> according to category)
> i.e
>> categ <- table(a$id,a$clm)
>> categ
>
> General Life
> A1 1 0
> A2 1 1
> A3 2 1
> A4 1 1
> A5 1 1
>
> The desired output is
>
> id loc clm
> 1 A1 B1 General
> 6 A3 B1 General
>
> Because the data set I am working on is quite big (~ 800,000 x 20)
> with majority of the fields values being long strings, looping turned out to
> be very inefficient in comapring individual rows..
>
> Are there any alternative efficient methods in implementing this problem?
>
> Greatly appreciate for your expertise.
>
>
>
> Steven
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list