[R] Assigning cases to groupings based on the values of several variables
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Dec 7 13:54:16 CET 2012
On 12-12-07 7:27 AM, Dimitri Liakhovitski wrote:
> Dear R-ers,
>
> my task is to simple: to assign cases to desired groupings based on the
> combined values on 2 variables. I can think of 3 methods of doing it.
> Method 1 seems to me pretty r-like, but it requires a lot of lines of code
> - onerous.
Since your groups are so regular, you can compute the groups directly.
Convert each column to a factor (this might have happened automatically,
depending on your data and options), then use as.integer to convert to a
numeric value.
So a simple solution would be
mydata$mygroup.m4 <- with(mydata,
4*(2-as.integer(factor(sex)))
+ as.integer(factor(age)))
It would be a little simpler if you wanted the sex factor in alphbetical
order; then you wouldn't need to subtract from 2.
If your real data wasn't so regular, another approach would be to set up
a matrix, indexed by sex and age, that gives the desired group number.
That is somewhat like your "groupings" solution; I'm not sure it would
be preferable to what you did.
Duncan Murdoch
> Method 2 is a loop, so not very good - as it loops through all rows of
> mydata.
> Method 3 is a loop but loops through fewer lines, so it seems to me more
> efficient.
> Can you please tell me:
> 1. Which of my methods is more efficient?
> 2. Is there maybe an even more efficient r-like way of doing it?
> Imagine - "mydata" is actually a very tall data frame.
> Thanks a lot!
> Dimitri
>
> ### My Data:
> mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2))
> (mydata)
>
> ### My desired assignments (in column "mygroup")
> groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8)
> (groupings)
>
> # No, I don't need a solution where the last column of "groupings" is
> stacked twice and bound to "mydata"
>
> # Method 1 of assigning to groups - requires a lot of lines of code:
> mydata$mygroup.m1<-NA
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8
> (mydata)
>
> # Method 2 of assigning to groups - very "loopy":
> mydata$mygroup.m2<-NA
> for(i in 1:nrow(mydata)){ # i<-1
> mysex<-mydata[i,"sex"]
> myage<-mydata[i,"age"]
> mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in%
> mysex)&(groupings$age %in% myage),"mygroup"]
> }
> (mydata)
>
> # Method 3 of assigning to groups - also "loopy", but less than Method 2:
> mydata$mygroup.m3<-NA
> for(i in 1:nrow(groupings)){ # i<-1
> mysex<-groupings[i,"sex"]
> myage<-groupings[i,"age"]
> mydata[(mydata$sex %in% mysex)&(mydata$age %in%
> myage),"mygroup.m3"]<-groupings[i,"mygroup"]
> }
> (mydata)
>
More information about the R-help
mailing list