[R] better way of recoding factors in data frame?

mohinder_datta at yahoo.com mohinder_datta at yahoo.com
Thu Apr 9 15:48:57 CEST 2009


Hi all,

I apologize in advance for the length of this post, but I wanted to make sure I was clear.

I am trying to merge two dataframes that share a number of rows (but some are unique to each data frame). Each row represents a subject in a study. The problem is that sex is coded differently in the two, including the way missing values are represented.

Here is an example of the merged dataframe:

> myFrame2
   SubjCode SubjSex          Sex
1      sub1       M         <NA>
2      sub2       F         <NA>
3      sub3       M         Male
4      sub4       M         <NA>
5      sub5       F         <NA>
6      sub6       F       Female
7      sub7                 <NA>
8      sub8                 <NA>
9      sub9         Not Recorded
10    sub10         Not Recorded

I then apply the following:

> myFrame2$SubjSex <- factor(myFrame2$SubjSex, levels = c('M','F'))
> myFrame2$SubjSex <- factor(myFrame2$SubjSex, labels = c('Male','Female'))
> myFrame2 <- transform(myFrame2, newSex = ifelse(is.na(SubjSex), Sex, SubjSex))

...and get this:
> myFrame2
   SubjCode SubjSex          Sex newSex
1      sub1    Male         <NA>      1
2      sub2  Female         <NA>      2
3      sub3    Male         Male      1
4      sub4    Male         <NA>      1
5      sub5  Female         <NA>      2
6      sub6  Female       Female      2
7      sub7    <NA>         <NA>     NA
8      sub8    <NA>         <NA>     NA
9      sub9    <NA> Not Recorded      3
10    sub10    <NA> Not Recorded      3

I need that last column to have just 1 (Male), 2 (Female) or 0 (Missing), and the only way I've come up with seems very kludgy:

> myFrame2$newSex[is.na(myFrame2$newSex)] <- 0
> myFrame2$newSex <- ifelse(myFrame2$newSex == 3, 0, myFrame2$newSex)

That gives me the right values for "newSex", but I'd like to positively select for the values I want to keep, rather than negatively selecting the ones to change - I tried this:

> myFrame2$newSex <- ifelse(myFrame2$newSex ==1 || myFrame2$newSex == 2, myFrame2$newSex, 0)

But I just get 1 for every row in newSex. Does anyone know of a way to do this by positively selecting the values 1 and 2?


Thanks,
Mohinder









More information about the R-help mailing list