[R] Factors? I think?
peter dalgaard
pdalgd at gmail.com
Fri Sep 9 11:57:19 CEST 2011
On Sep 9, 2011, at 09:13 , Petr PIKAL wrote:
> Hi
>
> Isn't it something for merge is designed?
Sort of. (You'd need to think carefully about what happens with non-matched codes.)
Wouldn't this do the trick as well?
in <- as.character(DeptCodes$DeptCodes)
out <- as.character(DeptCodes$DeptNames)
Doctors <- within(Doctors, DeptNames <- factor(DocDepts, levels=in, labels=out))
>
>> merge(Doctors, DeptCodes, by.x="DocDepts", by.y="Depts")
> DocDepts Docs DeptNames
> 1 1111 Christian\nChristianson Heart
> 2 5555 Bob Smith Brain
> 3 9999 Greg Jones Anesthesia
> 4 9999 Al Franklin Anesthesia
>
> It is easy to get rid of the first column.
>
> Regards
> Petr
>
>
>> Re: [R] Factors? I think?
>>
>> It's probably easiest to think of this as a compound map (doctor -> dept
>> code -> factor -> character -> integer -> dept code -> dept name as
>> character) and to treat the code as such: if you already have R objects
> with
>> the codes in them, it shouldn't be hard to do the transformation.
>>
>> Consider the following toy set up
>>
>> Docs = factor(c("Greg Jones","Bob Smith","Al Franklin","Christian
>> Christianson"))
>> DocDepts = factor(c("9999","5555","9999","1111"))
>> Doctors = data.frame(Docs, DocDepts)
>>
>> Depts = factor(1:9 * 1111)
>> DeptNames =
>> factor(c
>>
> ("Heart","Kidney","Feet","Teeth","Brain","Digestive","Diagnostic","Surgery","Anesthesia"))
>> DeptCodes = data.frame(Depts,DeptNames)
>> # Everything in our data frames is now some sort of factor so we can't
> match
>> things up in the "normal" ways
>>
>> # Now, you have to do some unpleasantly long but pretty straightforward
> code
>> to convert the factors in a way that makes the match properly:
>>
>> Doctors$numbers <- as.numeric(as.character(Doctors[,2])) ## Will extract
> the
>> "9999" as a real 9999, rather than the internal factor code
>> DeptCodes$values <- as.numeric(as.character(DeptCodes[,1]))
>>
>> match(Doctors$numbers, DeptCodes$values) ## Will map the department
> numbers
>> onto the correct rows of the DeptCodes df
>>
>> # Now we get the correct names using those row numbers
>> DeptAssignments = as.character(DeptCodes[match(Doctors$numbers,
>> DeptCodes$values),2])
>>
>> # Combine with doctor names to finish
>> NamesandTitles = cbind(as.character(Doctors[,1]),DeptAssignments)
>>
>> It's not the most elegant way of doing it, but hopefully it gives some
>> insight into how to work with factors. If you can send a little more
>> information about how your data is currently stored we can optimize this
>> into something easily repeatable but without specifics, I have to work
> in
>> generalities.
>>
>> Hope this helps,
>>
>> Michael Weylandt
>>
>> On Thu, Sep 8, 2011 at 6:36 PM, Totally Inept <kramer877 at gmail.com>
> wrote:
>>
>>> First of all, let me apologize, as this is probably an absurdly basic
>>> question. I did search before asking, but perhaps my ineptitude didn't
>>> allow
>>> me to apply what I read to what I'm doing. Totally new to R, and
> haven't
>>> done any code in any language in a long time.
>>>
>>> Basically I've got categories. They're department codes for doctors
> (say,
>>> 9999 for radiology or 5555 for endocrinology), which of course means
> that
>>> there are a good number of them, i.e. it's not practical for me to
> write
>>> them all out as I usually see in examples of categorical variables
>>> (factors).
>>>
>>> And then I've got a list of doctors that I'm actually interested in. I
> have
>>> the department codes associated with each, but I need to map the
> department
>>> name to the doctor name. So I might have Greg Jones, Bob Smith, Tom
> Wilson,
>>> etc... to go with 1234, 9999, 2222, etc.
>>>
>>> I need to turn Greg Jones, Bob Smith, ... and 1234, 9999, ... into
> Greg
>>> Jones, Bob Smith, ... Cardiology, Radiology, ....
>>>
>>> Obviously I could just search and replace within the csv files but I
> need
>>> something durable that I can run things through repeatedly.
>>>
>>> Anyhow, thanks to anyone willing to humor me with an answer.
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Factors-I-think-tp3800413p3800413.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list