[R] Recoding Multiple Variables in a Data Frame in One Step

David Winsemius dwinsemius at comcast.net
Tue Jul 26 03:58:51 CEST 2011


On Jul 25, 2011, at 6:48 PM, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>> ] On Behalf Of David Winsemius
>> Sent: Monday, July 25, 2011 3:39 PM
>> To: Anthony Damico
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Recoding Multiple Variables in a Data Frame in One  
>> Step
>>
>>
>> On Jul 21, 2011, at 8:06 PM, Anthony Damico wrote:
>>
>>> Hi, I can't for the life of me find how to do this in base R, but
>>> I'd be
>>> surprised if it's not possible.
>>>
>>> I'm just trying to replace multiple columns at once in a data frame.
>>>
>>> #load example data
>>> data(api)
>>>
>>> #this displays the three columns and eight rows i'd like to replace
>>> apiclus1[ apiclus1$meals > 98 , c( "pcttest" , "api00" ,
>>> "sch.wide" ) ]
>>>
>>>
>>> #the goal is to replace pcttest with 100, api100 with NA, and
>>> sch.wide with
>>> "Maybe"
>>>
>>> #this doesn't work--
>>> apiclus1[ apiclus1$meals > 98 , c( "pcttest" , "api00" ,
>>> "sch.wide" ) ] <-
>>> c( 100 , NA , "Maybe" )
>
> Try list(pcttest=100, api00=NA, sch.wide="Maybe") instead
> of c(100, NA, "Maybe") as the new value.
>
> Here is a self-contained example
>> df <- data.frame(Size=sin(1:10), Name=state.name[11:20],  
>> Value=11:20, stringsAsFactors=FALSE)
>> df[df$Size<0, c("Name", "Value")] <- list(Name="JUNK", Value=-99)
>> df

Anthony, Notice that Bill's solution , besides being error-free which  
mine wasn't, also allows you to mix data classes in your replacements,  
which mine would not have allowed you to to since the replacement was  
a single vector.

df[df$Size<0, c("Name", "Value")] <-  rep( c( "JUNK" , -99  ),
                                        each =  sum(df$Size < 0)
                                     )   # not exactly what I posted,  
but the same idea
df
          Size      Name Value
1   0.8414710    Hawaii    11
2   0.9092974     Idaho    12
3   0.1411200  Illinois    13
4  -0.7568025      JUNK   -99
5  -0.9589243      JUNK   -99
6  -0.2794155      JUNK   -99
7   0.6569866  Kentucky    17
8   0.9893582 Louisiana    18
9   0.4121185     Maine    19
10 -0.5440211      JUNK   -99
  str(df$Value )
# chr [1:10] "11" "12" "13" "-99" "-99" "-99" "17" "18" "19" "-99"

What appeared to be a solution was in fact only able to do so by  
coercing "df$Value" to a character vector.

-- 
David.

>

>        Size      Name Value
> 1   0.8414710    Hawaii    11
> 2   0.9092974     Idaho    12
> 3   0.1411200  Illinois    13
> 4  -0.7568025      JUNK   -99
> 5  -0.9589243      JUNK   -99
> 6  -0.2794155      JUNK   -99
> 7   0.6569866  Kentucky    17
> 8   0.9893582 Louisiana    18
> 9   0.4121185     Maine    19
> 10 -0.5440211      JUNK   -99
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>>
>>> #the results replace downward instead of across
>>> apiclus1[ apiclus1$meals > 98 , c( "pcttest" , "api00" ,
>>> "sch.wide" ) ]
>>
>> If I had noted that I would have tried this:
>>
>> apiclus1[ apiclus1$meals > 98 , rep( c( "pcttest" , "api00" ,
>> "sch.wide" ),
>>                                        each =  sum(apiclus1$meals >  
>> 98)
>>                                     ) ]
>>
>> Should be pretty easy to test, but since _you_ are the one  
>> responsible
>> for providing examples for testing when posting to rhelp,  I am going
>> to throw an untested theory back at you.
>>
>>
>>>
>>> I know I can do this with a few more steps (like one variable at a
>>> time or
>>> by counting the number of rows to replace and then using rep() ..but
>>> I'm
>>> hoping there's a quicker way?
>>>
>>>
>>> Thanks!!
>>>
>>> Anthony Damico
>>
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list