FW: [R] Newbie struggling with "factors"
Warnes, Gregory R
gregory_r_warnes at groton.pfizer.com
Fri Mar 29 17:56:38 CET 2002
Hint #1, to do any useful transformations on your variables you will
probably need to convert them temporarily into character variables (aka
strings). Do that with
as.character(n$OSUSE)
Probably your will want to convert each of the variables that are in this
format into a set of numeric variables. Something like this:
n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
n$OSUSE.Windows <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"1" %in% X ) )
n$OSUSE.Macintosh <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"2" %in% X ) )
n$OSUSE.Unix <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"3" %in% X ) )
Alternatively, if you often have variables like this, you might consider
creating a new object type that extends factor and that includes the
operations that you need.
Something like:
### Start Sample Code ###
checklist <- function(X, boxnames)
{
attr(X, "boxnames") <- boxnames
class(X) <- c("checklist","factor")
return(X)
}
contains <- function(X, name)
{
if(is.character(name) )
name <- pmatch( name, attr(X,"boxnames" ) )
retval <- sapply( strsplit(X, ",") , function(X) ( name %in% X ) )
return(retval)
}
numchecked <- function(X)
{
retval <- sapply( strsplit(X, ","), length )
return(retval)
}
summary.checklist <- function(x, ...)
{
sum <- apply( as.matrix(x), 2, sum )
mean <- apply( as.matrix(x), 2, mean )
return( rbind(sum,mean))
}
as.matrix.checklist <- function(x, ...)
{
sapply( attr(x, "boxnames"), function(YY) contains(x, YY) )
}
### End Sample Code ##
Here's some examples of using these functions:
> n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
>
> n$OSUSE <- checklist(n$OSUSE, c("Windows","Macintosh","Unix"))
#
# Check if OSUSE includes a specific OS
#
> contains( n$OSUSE, "Windows")
[1] TRUE TRUE TRUE
> contains( n$OSUSE, "Macintosh")
[1] FALSE FALSE TRUE
> contains( n$OSUSE, "Unix")
[1] FALSE TRUE TRUE
>
#
# Compute the average number of checked items
#
> numchecked(n$OSUSE)
[1] 1 2 3
> mean(numchecked(n$OSUSE))
[1] 2
>
#
# Create a matrix showing whether each box was checked or not
#
> as.matrix(n$OSUSE)
Windows Macintosh Unix
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE TRUE
[3,] TRUE TRUE TRUE
>
#
# Show some summary info
#
> summary(n$OSUSE)
Windows Macintosh Unix
sum 3 1.0000000 2.0000000
mean 1 0.3333333 0.6666667
Of course, you'll want to modify these classes to suit your needs. A little
time up front can help a lot.
If you like, I'll include these classes and any enhancements that you make
in my 'gregmisc' library.
-Greg
> -----Original Message-----
> From: Tom Arnold [mailto:thomas_l_arnold at yahoo.com]
> Sent: Friday, March 29, 2002 8:59 AM
> To: R
> Subject: [R] Newbie struggling with "factors"
>
>
> I am processing some survey results, and my data are
> being read in as "factors". I don't know how to
> process these things in any way.
>
> To start with, several of the survey questions are
> mulit-choice check boxes on the original (web-based)
> survey, as in "check all that apply".
>
> These are encoded as numbers. For example, if the
> survey has a question:
> Which operating systems have you used? (Check all that
> apply)
> [ ]Windows
> [ ]Macinotsh
> [ ]Unix
>
> ...then the data exported for three different
> responses might look like
> ;1;
> ;1,3;
> ;1,2,3;
>
> ...where ";" is the field delimiter.
> I use read.table to get the data in. I read all the
> survey data into a table "n" and the field above is
> called "OSUSE". When I query R about the field, it
> tells me it is class "factor"
>
> > class(n$OSUSE)
> [1] "factor"
> > mode(n$OSUSE)
> [1] "numeric"
>
> I'd like to be able to do some simple things like:
> what is the most common item checked (1, 2, or 3?)
> What is the average number of boxes checked?
>
> But I can't find any way to manipulate this "factor"
> field. What's the secret?
>
> Thanks.
>
> =====
> Tom Arnold
> Summit Media Partners
> Visit our web site at http://www.summitmediapartners.com
>
> __________________________________________________
>
> Yahoo! Greetings - send holiday greetings for Easter, Passover
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
>
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list