[R] Data Extraction
Berend Hasselman
bhh at xs4all.nl
Thu Nov 22 15:49:56 CET 2012
On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hello,
>
> I would appreciate if someone could help me resolve the following:
>
> 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work
>
> 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)':
> X1, X2, X3, X4, X5
>
> Thanks,
>
> Pradip Muhuri
>
>
> #Reproducible Example
> set.seed(5)
> df1<-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
> attach (df1)
> #delete rows if any of them NA for X1
> df1[!is.na( X1),][,1:5] # This works
>
> #delete rows if any of them NA for X1, X2, X3, X4 or X5
> df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work
Yet another way of doing this is
df1[!is.na(rowSums(df1)),][1:5]
But Petr's solution appears to be quickest.
See this:
> N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]}
> f2 <- function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
> f3 <- function(df) {df[complete.cases(df),][1:ncol(df)]}
>
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), columns=c("test","elapsed", "relative", "replications"))
test elapsed relative replications
1 d1 <- f1(df) 3.675 13.172 100
2 d2 <- f2(df) 0.401 1.437 100
3 d3 <- f3(df) 0.279 1.000 100
> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE
Berend
More information about the R-help
mailing list