[R] unexpected behaviour with ddply and colwise

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Apr 8 00:13:08 CEST 2010


Howdy,

I'm no plyr master, but here's my 2 cents ...

On Wed, Apr 7, 2010 at 5:15 PM, Stuart Andrews <stu.andrews at gmail.com> wrote:
> Hi,
>
> I am confused by results from:
>
>> ddply(aa, names(aa), colwise(sum))
>
> I thought ddply was just calling colwise(sum)() with each column.   However
> ddply() returns a 13 x 5 result !!
>
> The general result I expected is similar to that of  apply()  , or using
> colwise(sum)()  alone.  Shouldn't  ddply()  produce the same ?

Not sure what exactly is happening, but I don't think I'd expect ddply
to produce the same as the example you gave, since the second arg to
ddply determines how the aa data.frame should be split (row-wise)
before the colwise(...) do-hicky is called.

I'm not sure, but what are you trying to get at by row-wise splitting
`aa` by c('a', 'b', 'c', 'd', 'e')  [ie. namaes(aa)]?

>
> Thanks in advance for your help,
> - Stuart Andrews
>
>
>> set.seed(1234)
>> aa = as.data.frame(matrix(rnorm(100)>0.3,nrow=20))
>> names(aa) = c('a','b','c','d','e')
>> head(aa)
> a     b     c     d     e
> 1 FALSE FALSE FALSE  TRUE  TRUE
> 2  TRUE  TRUE FALSE  TRUE FALSE
> 3  TRUE  TRUE FALSE  TRUE  TRUE
> 4  TRUE FALSE FALSE  TRUE FALSE
> 5  TRUE FALSE FALSE  TRUE FALSE
> 6 FALSE FALSE FALSE FALSE  TRUE
>
>> ddply(aa, names(aa), colwise(sum))
> a b c d e
> 1  0 0 0 0 0
> 2  0 0 0 0 2
> 3  0 0 0 4 0
> 4  0 0 0 1 1
> 5  0 0 1 0 0
> 6  0 0 2 0 2
> 7  0 0 1 1 0
> 8  0 2 0 0 0
> 9  0 1 0 0 1
> 10 1 0 0 0 0
> 11 2 0 0 0 2
> 12 1 0 0 1 0
> 13 1 0 0 1 1
>
>> apply(as.matrix(aa),2,sum)
> a b c d e
> 5 3 4 8 9
>
>> colwise(sum)(aa)
>  a b c d e
> 1 5 3 4 8 9
>
>
> ... Isn't ddply() just doing something like this for each column??
>
>> colwise(sum)(aa[,1,drop=F])
>  a
> 1 5

That's what colwise is doing per each column of the data.frame it's
working on ... ddply does the split-by-row/apply/merge magic on the
data frame and is giving colwise smaller chunks of `aa` to work on at
a time...

So, to summarize, I think you just need to figure out the correct 2nd
arg to ddply for your specific problem.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list