[R] Data Extraction

Thu Nov 22 17:03:10 CET 2012

On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hi Berend,
> 
> You have compared all 3 ways.  ... very nicely evaluated. 
> 

Bert's solution is indeed nice and simple. But Petr's solution is still the quickest:

>N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
> f2 <- function(df) {df[!is.na(rowSums(df)),]}
> f3 <- function(df) {df[complete.cases(df),]}
> f4 <- function(df) {data.frame(na.omit(df))}
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <- f4(df), columns=c("test","elapsed", "relative", "replications"))
          test elapsed relative replications
1 d1 <- f1(df)   3.588   14.888          100
2 d2 <- f2(df)   0.403    1.672          100
3 d3 <- f3(df)   0.241    1.000          100
4 d4 <- f4(df)   0.557    2.311          100
>
> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE
> identical(d1,d4)
[1] TRUE

Berend