[R] Deleting rows and columns containing NA's and "" only

Mon Feb 13 18:21:04 CET 2012

On Mon, Feb 13, 2012 at 07:48:11AM -0800, syrvn wrote:
> Hello,
> 
> I use read.xls from the gdata package to read in xlsx files. Sometimes these
> data.frames contain NA columns
> and rows only. I know how to get rid of those ones but here is the R output
> of a test data set read in with read.xls
> 
> > t1
>      A          B         X         D               X.1         X.2
> 1 test      1         NA                        NA    
> 2 <NA>   asd    NA      asdasd    NA    
> 3                          NA      asdasd    NA    
> 4                          NA                        NA         NA
> 
> t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
> understand why in the first column it is <NA> while in the last two is not.
> I basically want to get rid of column 5 and 6 and row 4 as they do not
> contain any relevant information. If i do a is.na.data.frame(t1):
> 
> > is.na.data.frame(t1)
>          A     B    X     D  X.1   X.2
> [1,] FALSE FALSE TRUE FALSE TRUE FALSE
> [2,]  TRUE FALSE TRUE FALSE TRUE FALSE
> [3,] FALSE FALSE TRUE FALSE TRUE FALSE
> [4,] FALSE FALSE TRUE FALSE TRUE FALSE
> 
> does not give me the result I hoped to get.
> 
> It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
> FALSE because if I do
> 
> > as.character(t1[4,6])
> [1] "NA "

Hi.

I do not know, how "NA " appeared, however, it is possible
to change them to real NA as follows.

  # some data frame
  df <- structure(list(a = c(NA, 2L, 3L, 4L), b = c("a", NA, "c", "NA "),
    c = structure(c(1L, 2L, NA, 4L), .Label = c("e", "f", "g", "h"),
    class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA, -4L),
    class = "data.frame")
  df

     a    b    c
  1 NA    a    e
  2  2 <NA>    f
  3  3    c <NA>
  4  4  NA     h

  df[4, 2] # this is not NA, but "NA "

  [1] "NA "

  # replace all "NA " by NA in column 2

  df[which(df[,2] == "NA "), 2] <- NA
  df

     a    b    c
  1 NA    a    e
  2  2 <NA>    f
  3  3    c <NA>
  4  4 <NA>    h

Hope this helps.

Petr Savicky.