[R] efficient way to make NAs of empty cells in a factor (or character)
Henrik Parn
henrik.parn at bio.ntnu.no
Thu Aug 3 15:46:32 CEST 2006
Dear all,
I have some csv-files (originating from Excel-files) containing empty
cells. In my example file I have four variables of different classes,
each with some empty cells in the original csv-file:
> test <- read.csv2("test.csv", dec=".")
> test
id id2 x y
1 a 1 NA
2 b e NA 2.2
3 f 3 3.3
4 c g 4 4.4
> class(test$id)
[1] "factor"
> class(test$id2)
[1] "factor"
> class(test$x)
[1] "integer"
> class(test$y)
[1] "numeric"
In the help text of read.csv2 you can read 'Blank fields are also
considered to be missing values in logical, integer, numeric and complex
fields.'. Thus, empty cells in a factor (or a character I assume) is not
considered as missing values but an own level:
> is.na(test$id)
[1] FALSE FALSE FALSE FALSE
> levels(test$id)
[1] "" "a" "b" "c"
When I work with my real (larger) dataset I would like to use functions
like 'is.na' and '!is.na' on factors. Now I wonder if there is an R
alternativ to do 'search (for empty cells) and replace (with NA)' in Excel?
I have tried a modification of Uwe Ligges suggestion on missing value
posted 2 Aug:
> is.na(test[test==""]) <- TRUE
...but it did not work on the data set:
Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, NA :
rhs is the wrong length for indexing by a logical matrix
However it worked fine when applied to a single vector:
> is.na(test$id[test$id==""]) <- TRUE
> test$id
[1] a b <NA> c
Levels: a b c
> is.na(test$id)
[1] FALSE FALSE TRUE FALSE
Is there a more efficient way to fill empty cells in all my factors in R
or should I just do it in advance in Excel by 'search and replace'?
Thanks in advance!
--
************************
Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway
+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)
More information about the R-help
mailing list