[R] Still can't find missing data - How do I get NA in xtabs with factors?

Farley, Robert FarleyR at metro.net
Fri May 29 20:14:10 CEST 2009


Let's see if I understand this.  Do I iterate through
    x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
for each of the few hundred variables (x) in my data frame?


I tried to do this all at once and failed:
> ToyData
    Data1 Data2  Data3 Weight
101   Sam   Red Banana    1.1
102   Sam Green Banana    2.1
103   Sam  Blue Orange    2.1
104  Fred   Red Orange    2.1
105  Fred Green  Guava    2.1
106  Fred  Blue  Guava    2.1
107  <NA>   Red   Pear   50.1
108  <NA> Green   Pear   50.1
109  <NA>  Blue   <NA> 1000.2
> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, na.action=na.pass))
Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) :
  unused argument(s) (exclude = NULL, na.action = function (object, ...)
> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
> ToyData
 Data1  Data2  Data3 Weight
  <NA>   <NA>   <NA>   <NA>
Levels:
>
But it didn't work.  Don't I need to do this separately for each variable?



Is there a way to get read.spss to insert "NA" levels for each variable when I create the data frame?  Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does not?


Will this be a problem with read.dta as well?




Robert Farley
Metro
www.Metro.net


-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 20:39
To: Farley, Robert
Subject: RE: [R] Still can't find missing data

In R factors don't save space over character vectors - only
one copy of any given string is kept in memory in either case.
Factors do let you order the levels in the way you want and
that is often important in presentations.

You can add NA to the list of levels of a factor by doing
    x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
where 'x' represents each factor in your dataset.  After
doing that is.na(x) will be all FALSE and you may not
want that for other situations.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
> Sent: Thursday, May 28, 2009 5:27 PM
> To: R-help
> Subject: Re: [R] Still can't find missing data
>
> That seems to work for the toy data.  How do I implement this
> change with my real data, which are read from very large
> Stata and SPSS files and keep the factor definitions?  Won't
> I be losing information (and creating a larger dataset) by
> not using the factor levels?
>
>
> How do I recover the factor values?  I read my datafile
> (read.spss using   use.value.labels = FALSE,) and got this:
>
>               connector
> Mode_orig_only            1            9
>           1       17.814338     0.000000
>           3       49.128982     0.000000
>           4      525.978899     0.000000
>           5      913.295370     0.000000
>           6      114.302764     0.000000
>           7      298.151438     0.000000
>           8       93.088049     0.000000
>           9      233.794168     0.000000
>           10      20.764539     0.000000
>           11     424.120506     0.000000
>           12       8.054528     0.000000
>           13       6.010790     0.000000
>           14    1832.748525     0.000000
>           15   10191.284139     0.000000
>           16    2099.771923     0.000000
>           17    1630.148576     0.000000
>           <NA>     0.000000  9491.013249
>
> which does have the "NA" row, but not the factor labels.  If
> I read the file with use.value.labels=TRUE I can see what I'm
> summarizing, but not the NAs.  Can't I have both?
>
> The top summary will also omit all 0 value factors (of
> course) in the variable summarized.
>
>
> The same summary using factors:
>                                                              connector
>
> Mode_orig_only
>  OD Passenger    Connector
>
>   Walked/Biked
>     17.814338     0.000000
>
>    I flew in from another a place/connected
>      0.000000     0.000000
>
>   Amtrak
>     49.128982     0.000000
>
>   Bus - Chartered bus or van
>    525.978899     0.000000
>
>   Bus - Hotel Courtesy van
>    913.295370     0.000000
>
>   Bus - MTA (Metro) or other public transit bus
>    114.302764     0.000000
>
>   Bus - Scheduled airport bus or van (e.g. Airport bus or
> Disn   298.151438     0.000000
>
>   Bus - Union Station Flyaway
>     93.088049     0.000000
>
>   Bus - Van Nuys Flyaway
>    233.794168     0.000000
>
>   Green line/light rail
>     20.764539     0.000000
>
>   Limousine/town car
>    424.120506     0.000000
>
>   Metrolink
>      8.054528     0.000000
>
>   Motorcycle
>      6.010790     0.000000
>
>   On-call shuttle/van (e.g. Super Shuttle, Prime Time)
>   1832.748525     0.000000
>
>   Car/truck/van - Private
>  10191.284139     0.000000
>
>   Car/truck/van - Rental
>   2099.771923     0.000000
>
>   Taxi
>   1630.148576     0.000000
>
>   ..Refused
>      0.000000     0.000000
>
>
>
>
>
>
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Thursday, May 28, 2009 16:26
> To: Farley, Robert
> Subject: RE: [R] Still can't find missing data
>
> Try reading it in with read.table's argument stringsAsFactors=FALSE.
>
> I think the underlying problem is that exclude= is used only if
> the classifying variables are not already factors.  I haven't studied
> the help file well enough to see if that is what is is documented
> to do, but it seems misleading.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
> > Sent: Thursday, May 28, 2009 4:10 PM
> > To: R-help
> > Subject: Re: [R] Still can't find missing data
> >
> > In this toy data, each of the tables should sum to 1111
> > None of the tables shows NA columns or rows.
> >
> >
> > > ################################
> > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
> > sep=",", na.strings="NA", dec=".", row.names="ID_Num")
> > > ToyData
> >     Data1 Data2  Data3 Weight
> > 101   Sam   Red Banana      1
> > 102   Sam Green Banana      2
> > 103   Sam  Blue Orange      2
> > 104  Fred   Red Orange      2
> > 105  Fred Green  Guava      2
> > 106  Fred  Blue  Guava      2
> > 107  <NA>   Red   Pear     50
> > 108  <NA> Green   Pear     50
> > 109  <NA>  Blue   <NA>   1000
> > > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
> > na.action=na.pass, ToyData)
> >       Data2
> > Data1  Blue Green Red
> >   Fred    2     2   2
> >   Sam     2     2   1
> > > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
> > na.action=na.pass,drop.unused.levels = FALSE, ToyData)
> >       Data2
> > Data1  Blue Green Red
> >   Fred    2     2   2
> >   Sam     2     2   1
> > > xtabs(Weight ~  Data1 + Data3, exclude=NULL,
> > na.action=na.pass,drop.unused.levels = FALSE, ToyData)
> >       Data3
> > Data1  Banana Guava Orange Pear
> >   Fred      0     4      2    0
> >   Sam       3     0      2    0
> > >
> >
> >
> >
> >
> >
> > Robert Farley
> > Metro
> > www.Metro.net
> >
> >
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
> > Sent: Thursday, May 28, 2009 05:46
> > To: r-help at r-project.org
> > Subject: Re: [R] Still can't find missing data
> >
> >
> >
> >
> > Farley, Robert wrote:
> > >
> > > I can't get the syntax that will allow me to show NA values
> > (rows) in the
> > > xtabs.
> > >
> > > lengthy non-reproducible example removed
> > >
> >
> > If you want a reproducible answer, prepare a reproducible
> > result. And check
> > that the
> > syntax is
> >
> > na.action=na.pass
> >
> > Dieter
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
> > 27p23761006.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list