[R] Factor function
peter dalgaard
pdalgd at gmail.com
Tue Apr 26 19:59:22 CEST 2011
On Apr 26, 2011, at 18:52 , Petr Savicky wrote:
> On Tue, Apr 26, 2011 at 10:51:33AM +0200, Petr PIKAL wrote:
>> Hi
>>
>>
>> d<-data.frame(matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"),
>> ncol=3, byrow=TRUE))
>>
>> Change character value "NA" to missing value <NA>
>> d[d[,3]=="NA",3]<-NA
>>
>> If you want drop any unused levels of a factor just use
>>
>> factor(d[,3])
>> [1] xx yy <NA>
>> Levels: xx yy
>
> An explicit NA is a good idea. If the NA is introduced before
> creating the data frame, then also the data frame will not
> contain the unwanted level.
>
> a<-matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"),
> ncol=3, byrow=TRUE)
> a[a[,3]=="NA",3]<-NA
> d<-data.frame(a)
> d[,3]
>
> [1] xx yy <NA>
> Levels: xx yy
>
> If the replacement should be done in the whole matrix, then
>
> a[a=="NA"]<-NA
>
> may be used.
>
> Petr Savicky.
I think there's a buglet in here. According to the docs, "If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded". However, that plainly doesn't work:
> cc <- c("x","y","NA")
> ff <- factor(cc)
> factor(ff,exclude=1)
[1] x y NA
Levels: NA x y
> factor(ff,exclude=ff[3])
[1] x y NA
Levels: NA x y
> factor(ff,exclude=ff[2])
[1] x y NA
Levels: NA x y
In these cases, the internal logic converts exclude to integer, and then uses match(levels, exclude) where levels is unique(x), i.e., a factor. This won't work because match() matches on the _character_ representation of x.
The cleanest version that I can think of for the original problem is
> factor(ff, levels=setdiff(levels(ff), "NA"))
[1] x y <NA>
Levels: x y
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list