[R] Eliminate level information

Sarah Goslee sarah.goslee at gmail.com
Sat Jul 9 23:10:44 CEST 2011


There's something you're not telling us. How did you get your data into R?
what does str(train) show? (see inline)

On Sat, Jul 9, 2011 at 3:38 PM, darrelkj <darrelkj at mail.uc.edu> wrote:
> Hi, I hope this formatting is correct as it is my first time.
>
> I am trying to do comparisons of values in a data frame that has some factor
> variables.
> One instance is
>
>> train$sex[2]
> [1]  Male
> Levels:  Female  Male
>
> So the value is Male but a comparison like "Male" == train$sex[2]
> will always return FALSE because of the level information included.

Nope. The R developers are smarter than that.
> train <- data.frame(ind=1:5, sex=c("Male", "Male", "Female", "Female", "Male"), workclass=c("Private", "Local-gov", "Private", NA, "Private"))
> train$sex[2]
[1] Male
Levels: Female Male
> train$sex[2] == "Male"
[1] TRUE


> Another problem this creates is
>
>> factor(train$workclass[25:30])
> [1]  Private    Local-gov  Private    NA         Private
> [6]  Private
> Levels:  Local-gov  NA  Private
>
>> is.na(train$workclass[25:30])
> [1] FALSE FALSE FALSE FALSE FALSE FALSE
>
> Which they are all false because of the levels data in the comparison.  This
> would seem to be bug because I thought that NA was a protected keyword but
> it is being used here as a level.  Which will make it fail the missing value
> criteria for two reasons now because it is a level.

They're also smarter than that:
> train$workclass
[1] Private   Local-gov Private   <NA>      Private
Levels: Local-gov Private
> is.na(train$workclass)
[1] FALSE FALSE FALSE  TRUE FALSE

Since both of the things you object to actually work the way they
should, and not the way you report, you need to give the list a
*reproducible* example so that we can help you.

> I tried a conversion using data.matrix() but that gets rid of all factor
> information and makes things worse.  Is there a way to suppress 'Levels:
> Female  Male'.

You can convert the factor to character using as.character
> train$sex <- as.character(train$sex)
> train$sex[2]
[1] "Male"

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list