[R] Eliminate level information
Sarah Goslee
sarah.goslee at gmail.com
Sat Jul 9 23:10:44 CEST 2011
There's something you're not telling us. How did you get your data into R?
what does str(train) show? (see inline)
On Sat, Jul 9, 2011 at 3:38 PM, darrelkj <darrelkj at mail.uc.edu> wrote:
> Hi, I hope this formatting is correct as it is my first time.
>
> I am trying to do comparisons of values in a data frame that has some factor
> variables.
> One instance is
>
>> train$sex[2]
> [1] Male
> Levels: Female Male
>
> So the value is Male but a comparison like "Male" == train$sex[2]
> will always return FALSE because of the level information included.
Nope. The R developers are smarter than that.
> train <- data.frame(ind=1:5, sex=c("Male", "Male", "Female", "Female", "Male"), workclass=c("Private", "Local-gov", "Private", NA, "Private"))
> train$sex[2]
[1] Male
Levels: Female Male
> train$sex[2] == "Male"
[1] TRUE
> Another problem this creates is
>
>> factor(train$workclass[25:30])
> [1] Private Local-gov Private NA Private
> [6] Private
> Levels: Local-gov NA Private
>
>> is.na(train$workclass[25:30])
> [1] FALSE FALSE FALSE FALSE FALSE FALSE
>
> Which they are all false because of the levels data in the comparison. This
> would seem to be bug because I thought that NA was a protected keyword but
> it is being used here as a level. Which will make it fail the missing value
> criteria for two reasons now because it is a level.
They're also smarter than that:
> train$workclass
[1] Private Local-gov Private <NA> Private
Levels: Local-gov Private
> is.na(train$workclass)
[1] FALSE FALSE FALSE TRUE FALSE
Since both of the things you object to actually work the way they
should, and not the way you report, you need to give the list a
*reproducible* example so that we can help you.
> I tried a conversion using data.matrix() but that gets rid of all factor
> information and makes things worse. Is there a way to suppress 'Levels:
> Female Male'.
You can convert the factor to character using as.character
> train$sex <- as.character(train$sex)
> train$sex[2]
[1] "Male"
Sarah
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list