[R] question about categorical variables in R
Jim Lemon
drjimlemon at gmail.com
Sat Sep 12 07:12:36 CEST 2015
Hi Lida,
Given that this is such a common question and the R FAQ doesn't really
answer it, perhaps a brief explanation will help. In R the factor class is
a sort of combination of the literal representation of the data and a
sequence of numbers beginning at 1 that are alphabetically ordered by
default. For example, suppose you read in what you think are a set of
numbers like this:
x<-read.table(text="1 2 3
+ 4 5 6
+ 7 . 9")
x
V1 V2 V3
1 1 2 3
2 4 5 6
3 7 . 9
Now look at the classes of the columns:
sapply(x,class)
V1 V2 V3
"integer" "factor" "integer"
Somehow that second column has become a factor. This is because "." cannot
be represented as a number and I didn't tell R that it should be regarded
as a missing value (na.strings="."). R has taken the literal values in that
column
levels(x$V2)
[1] "." "2" "5"
and attached numbers to those values their alphabetic order.
as.numeric(x$V2)
[1] 2 3 1
You can get the original numbers back like this:
as.numeric(as.character(x$V2))
[1] 2 5 NA
Warning message:
NAs introduced by coercion
and R helpfully tells you that it couldn't coerce "." to a number.
In your example, the factor is created for you
mf<-factor(c("male","female"))
> mf
[1] male female
Levels: female male
but as you can see, the default order of the factor may not be what you
think
as.numeric(mf)
[1] 2 1
For a more complete account of factors, see "An Introduction to R" section
4 "Ordered and unordered factors".
Jim
On Sat, Sep 12, 2015 at 12:45 AM, Lida Zeighami <lid.zigh at gmail.com> wrote:
> Hi dear experts,
> I have a general question in R, about the categorical variable such as
> Gender(Male or Female)
> If I have this column in my data and wanted to do regression model or feed
> the data to seqmeta packages (singlesnp, skat meta) , would you please let
> me know should I code them first ( male=0 and female=1) or R programming do
> it for me?
> Because when I didn't code them, R still can do the analysis without any
> error but I'm not sure it's correct or not?
> Thanks
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list