[R] Deleting rows and columns containing NA's and "" only
syrvn
mentor_ at gmx.net
Mon Feb 13 16:48:11 CET 2012
Hello,
I use read.xls from the gdata package to read in xlsx files. Sometimes these
data.frames contain NA columns
and rows only. I know how to get rid of those ones but here is the R output
of a test data set read in with read.xls
> t1
A B X D X.1 X.2
1 test 1 NA NA
2 <NA> asd NA asdasd NA
3 NA asdasd NA
4 NA NA NA
t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
understand why in the first column it is <NA> while in the last two is not.
I basically want to get rid of column 5 and 6 and row 4 as they do not
contain any relevant information. If i do a is.na.data.frame(t1):
> is.na.data.frame(t1)
A B X D X.1 X.2
[1,] FALSE FALSE TRUE FALSE TRUE FALSE
[2,] TRUE FALSE TRUE FALSE TRUE FALSE
[3,] FALSE FALSE TRUE FALSE TRUE FALSE
[4,] FALSE FALSE TRUE FALSE TRUE FALSE
does not give me the result I hoped to get.
It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
FALSE because if I do
> as.character(t1[4,6])
[1] "NA "
one can see that there is a whitespace after NA which is, however, not in
the excel sheet for sure.
I do not know how to deal with that...
Cheers
--
View this message in context: http://r.789695.n4.nabble.com/Deleting-rows-and-columns-containing-NA-s-and-only-tp4384173p4384173.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list