[R] efficient way to make NAs of empty cells in a factor (or character)

Thu Aug 3 15:46:32 CEST 2006

Dear all,

I have some csv-files (originating from Excel-files) containing empty 
cells. In my example file I have four variables of different classes, 
each with some empty cells in the original csv-file:

 > test <- read.csv2("test.csv", dec=".")

 > test
  id id2  x   y
1  a      1  NA
2  b   e NA 2.2
3      f  3 3.3
4  c   g  4 4.4

 > class(test$id)
[1] "factor"
 > class(test$id2)
[1] "factor"
 > class(test$x)
[1] "integer"
 > class(test$y)
[1] "numeric"

In the help text of read.csv2 you can read 'Blank fields are also 
considered to be missing values in logical, integer, numeric and complex 
fields.'. Thus, empty cells in a factor (or a character I assume) is not 
considered as missing values but an own level:

 > is.na(test$id)
[1] FALSE FALSE FALSE FALSE
 > levels(test$id)
[1] ""  "a" "b" "c"

When I work with my real (larger) dataset I would like to use functions 
like 'is.na' and '!is.na' on factors. Now I wonder if there is an R 
alternativ to do 'search (for empty cells) and replace (with NA)' in Excel?

I have tried a modification of Uwe Ligges suggestion on missing value 
posted 2 Aug:
 > is.na(test[test==""]) <- TRUE

...but it did not work on the data set:

Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, NA :
        rhs is the wrong length for indexing by a logical matrix

However it worked fine when applied to a single vector:

 > is.na(test$id[test$id==""]) <- TRUE
 > test$id
[1] a    b    <NA> c  
Levels:  a b c

 > is.na(test$id)
[1] FALSE FALSE  TRUE FALSE

Is there a more efficient way to fill empty cells in all my factors in R 
or should I just do it in advance in Excel by 'search and replace'?

Thanks in advance!

-- 
************************
Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)