[R] Newbie struggling with "factors"

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Fri Mar 29 17:02:02 CET 2002

Jonathan Baron <baron at cattell.psych.upenn.edu> writes:

> >These are encoded as numbers. For example, if the
> >survey has a question:
> >Which operating systems have you used? (Check all that
> >apply)
> >[ ]Windows
> >[ ]Macinotsh
> >[ ]Unix
> >
> >...then the data exported for three different
> >responses might look like
> >;1;
> >;1,3;
> >;1,2,3;
> >...where ";" is the field delimiter. 
> I assue you have told read.table() that ";" is the delimiter,
> with the appropriate option setting.  But this would mean, I
> think, that you would have some fields that look like "1,2,3".
> This is probably why you are getting factors: "1,2,3" is not a
> number, so R assumes it is part of a factor.  To extract the
> numbers from things like this, you might have to use things like
> substr() and strsplit().

In other words, you (Tom) are recording patterns, not individual
systems usage. A more conventional coding would use three variables,
yes/no for each OS. With the coding you have, all you can really do is
table(x) giving you counts of people using "1,2" "3" "1,2,3" etc. but
counting the total number of, say, Windows users is inconvenient.

However, you can recover the conventional coding using slightly sneaky
code like

win.use <- sapply(strsplit(x,","), "%in%", x=1)

or, sneakier yet

> lapply(c(win=1,mac=2,unix=3),function(i)sapply(strsplit(x,","), "%in%", x=i))



   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list