[R] Data Extraction
Berend Hasselman
bhh at xs4all.nl
Thu Nov 22 17:03:10 CET 2012
On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hi Berend,
>
> You have compared all 3 ways. ... very nicely evaluated.
>
Bert's solution is indeed nice and simple. But Petr's solution is still the quickest:
>N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
> f2 <- function(df) {df[!is.na(rowSums(df)),]}
> f3 <- function(df) {df[complete.cases(df),]}
> f4 <- function(df) {data.frame(na.omit(df))}
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), d4 <- f4(df), columns=c("test","elapsed", "relative", "replications"))
test elapsed relative replications
1 d1 <- f1(df) 3.588 14.888 100
2 d2 <- f2(df) 0.403 1.672 100
3 d3 <- f3(df) 0.241 1.000 100
4 d4 <- f4(df) 0.557 2.311 100
>
> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE
> identical(d1,d4)
[1] TRUE
Berend
More information about the R-help
mailing list