[R] Deleting rows and columns containing NA's and "" only
Petr Savicky
savicky at cs.cas.cz
Mon Feb 13 18:21:04 CET 2012
On Mon, Feb 13, 2012 at 07:48:11AM -0800, syrvn wrote:
> Hello,
>
> I use read.xls from the gdata package to read in xlsx files. Sometimes these
> data.frames contain NA columns
> and rows only. I know how to get rid of those ones but here is the R output
> of a test data set read in with read.xls
>
> > t1
> A B X D X.1 X.2
> 1 test 1 NA NA
> 2 <NA> asd NA asdasd NA
> 3 NA asdasd NA
> 4 NA NA NA
>
> t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
> understand why in the first column it is <NA> while in the last two is not.
> I basically want to get rid of column 5 and 6 and row 4 as they do not
> contain any relevant information. If i do a is.na.data.frame(t1):
>
> > is.na.data.frame(t1)
> A B X D X.1 X.2
> [1,] FALSE FALSE TRUE FALSE TRUE FALSE
> [2,] TRUE FALSE TRUE FALSE TRUE FALSE
> [3,] FALSE FALSE TRUE FALSE TRUE FALSE
> [4,] FALSE FALSE TRUE FALSE TRUE FALSE
>
> does not give me the result I hoped to get.
>
> It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
> FALSE because if I do
>
> > as.character(t1[4,6])
> [1] "NA "
Hi.
I do not know, how "NA " appeared, however, it is possible
to change them to real NA as follows.
# some data frame
df <- structure(list(a = c(NA, 2L, 3L, 4L), b = c("a", NA, "c", "NA "),
c = structure(c(1L, 2L, NA, 4L), .Label = c("e", "f", "g", "h"),
class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA, -4L),
class = "data.frame")
df
a b c
1 NA a e
2 2 <NA> f
3 3 c <NA>
4 4 NA h
df[4, 2] # this is not NA, but "NA "
[1] "NA "
# replace all "NA " by NA in column 2
df[which(df[,2] == "NA "), 2] <- NA
df
a b c
1 NA a e
2 2 <NA> f
3 3 c <NA>
4 4 <NA> h
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list