[R] Efficiency question: replacing all NAs with a zero
Dimitri Liakhovitski
ld7631 at gmail.com
Tue Mar 30 02:21:50 CEST 2010
Dear R'ers,
I have a very large data frame (over 4000 rows and 2,500 columns). My
task is very simple - I have to replace all NAs with a zero. My code
works fine on smaller data frames - but I have to deal with a huge one
and there are many NAs in each column.
R runs out of memory on me ("Reached total allocation of 1535Mb: see
help(memory.size)"). Is there any other, more efficient way of doing
it?
Thanks a lot for any hints!
Dimitri
# Building an example frame:
frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
set.seed(1234)
for(i in names(frame)){
i.for.NA<-sample(1:100,60)
frame[[i]][i.for.NA]<-NA
}
# Replacing all NAs in "frame" with zeros - is of course fast in this
example, because this data frame is very small
system.time({
frame<-lapply(frame,function(x){
x[is.na(x)]<-0
return(x)
})})
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
More information about the R-help
mailing list