[R] slow with indexing with is.na
arun
smartpink111 at yahoo.com
Thu May 15 12:58:56 CEST 2014
Hi,
If you can convert the data.frame to matrix, there would be some improvement.
For e.g.
fun1 <- function(data){
data[is.na(data)] <- 0
data}
fun2 <- function(data){
mat <- as.matrix(data)
mat[is.na(mat)] <-0
mat}
fun3 <- function(data){
mat <- as.matrix(data)
indx <- which(is.na(mat), arr.ind=TRUE)
mat[indx] <- 0
mat}
fun4 <- function(data){
mat <- as.matrix(data)
indx <- is.na(mat)
mat[indx] <- 0
mat}
set.seed(4853)
dat1 <- as.data.frame(matrix(sample(c(NA,1:20),3e3*3e3,replace=TRUE),ncol=3e3))
system.time(res1 <- fun1(dat1))
# user system elapsed
# 1.224 0.040 1.267
system.time(res2 <- fun2(dat1))
# user system elapsed
# 0.368 0.052 0.420
system.time(res3 <- fun3(dat1))
#user system elapsed
# 0.170 0.052 0.223
system.time(res4 <- fun4(dat1))
# user system elapsed
# 0.277 0.075 0.354
identical(res1,as.data.frame(res2))
#[1] TRUE
identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.
Hi,
I am new to r (with experience in Matlab). I'm still exploring with the syntax and learning to think in a R way.
I have some data (3000 x 3000) in data.frame class and the following code seems to perform very slow.
data[is.na(data)] = 0
Would be good get some comments on this from some experienced users. Thanks.
More information about the R-help
mailing list