[R] Using plyr::dply more (memory) efficiently?
Matthew Dowle
mdowle at mdowle.plus.com
Thu Apr 29 17:46:17 CEST 2010
"Steve Lianoglou" <mailinglist.honeypot at gmail.com> wrote in message
news:t2ybbdc7ed01004290812n433515b5vb15b49c170f5a353 at mail.gmail.com...
> Thanks for directing me to the data.table package. I read through some
> of the vignettes, and it looks quite nice.
>
> While your sample code would provide answer if I wanted to just
> compute some summary statistic/function of groups of my data.frame
> (using `by=symbol`), what's the best way to produces several pieces of
> info per subset.
>
> For instance, I see that I can do something like this:
>
> summaries[, list(counts=sum(counts), width=sum(exon.width)), by=symbol]
Yes, thats it.
> But what if I need to do some more complex processing within the
> subsets defined in `by=symbol` -- like several lines of programming
> logic for 1 result, say.
>
> I guess I can open a new block that just returns a data.table? Like:
>
> summaries[, {
> cnts <- sum(counts)
> ew <- sum(exon.width)
> # ... some complex things
> complex <- # .. result of complex things
> data.table(counts=cnts, width=ew, cplx=complex)
>}, by=symbol]
>
> Is that right? (I mean, it looks like it's working, but maybe there's
> a more idiomatic way(?))
Yes, you got it. Rather than a data.table at the end though, just return a
list, its faster.
Shorter vectors will still be recycled to match any longer ones.
Or just this :
summaries[, list(
counts = sum(counts),
width = sum(exon.width),
cplx = # .. result of complex things
), by=symbol]
Sounds like its working, but could you give us an idea whether it is quick
and memory efficient ?
More information about the R-help
mailing list