[R] problem with factor levels
PIKAL Petr
petr.pikal at precheza.cz
Tue Dec 4 10:28:42 CET 2012
Hi
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jeremy.Shearman
> Sent: Tuesday, December 04, 2012 9:35 AM
> To: r-help at r-project.org
> Subject: [R] problem with factor levels
>
> Hi
> I have a data.frame with 371,718 obs. of 12 variables (see below
> for an str). My problem is with V1, a Factor w/ 93144 levels, there
> should actually be 93994 levels. Each entry looks like:
> comp[number]_c[number]_seq[number]
> for example
> comp215489_c0_seq40
> R is grouping as though the last number is a decimal for some reason,
> in other words comp215489_c0_seq40 and comp215489_c0_seq4 are
> considered to be the same. My problem is that they are not the same so
> when I group by this factor I am losing 800 levels.
>
Hm. How did you constructed those factors?
> factor(c("comp215489_c0_seq40", "comp215489_c0_seq4") )
[1] comp215489_c0_seq40 comp215489_c0_seq4
Levels: comp215489_c0_seq4 comp215489_c0_seq40
gives me 2 levels as expected. I also doubt that R will do such stripping during reading from other file.
Regards
Petr
> Here is an str
>
> 'data.frame': 371718 obs. of 12 variables:
> $ V1 : Factor w/ 93144 levels "comp100000_c0_seq1",..: 92271 91685 29
> 30
> 1564 1564 1623 91700 91701 91848 ...
> $ V2 : Factor w/ 17162 levels "gi|345842331|ref|NM_001244016.1|",..:
> 10119
> 10779 13210 13210 11522 8115 13079 14493 14493 15858 ...
> $ V3 : num 95.5 90.2 98.7 99.2 81.4 ...
> $ V4 : int 335 153 237 122 258 127 306 258 120 177 ...
> $ V5 : int 15 15 3 1 38 19 20 23 5 9 ...
> $ V6 : int 0 0 0 0 4 2 0 0 0 0 ...
> $ V7 : int 1 45 1 43 1 129 1 54 1 70 ...
> $ V8 : int 335 197 237 164 254 254 306 311 120 246 ...
> $ V9 : int 6866 18 3172 3438 67 122 3927 42 346 195 ...
> $ V10: int 7200 170 3408 3559 318 247 4232 299 465 19 ...
> $ V11: num 7e-155 2e-46 4e-125 2e-61 3e-24 ...
> $ V12: num 545 184 446 234 111 69.9 448 329 198 280 ..
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/problem-
> with-factor-levels-tp4652006.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list