[R] Using plyr::dply more (memory) efficiently?
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Apr 29 17:12:22 CEST 2010
Hi Matthew,
On Thu, Apr 29, 2010 at 9:52 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> I don't know about that, but try this :
>
> install.packages("data.table", repos="http://R-Forge.R-project.org")
> require(data.table)
> summaries = data.table(summaries)
> summaries[,sum(counts),by=symbol]
>
> Please let us know if that returns the correct result, and if its
> memory/speed is ok ?
Thanks for directing me to the data.table package. I read through some
of the vignettes, and it looks quite nice.
While your sample code would provide answer if I wanted to just
compute some summary statistic/function of groups of my data.frame
(using `by=symbol`), what's the best way to produces several pieces of
info per subset.
For instance, I see that I can do something like this:
summaries[, list(counts=sum(counts), width=sum(exon.width)), by=symbol]
But what if I need to do some more complex processing within the
subsets defined in `by=symbol` -- like several lines of programming
logic for 1 result, say.
I guess I can open a new block that just returns a data.table? Like:
summaries[, {
cnts <- sum(counts)
ew <- sum(exon.width)
# ... some complex things
complex <- # .. result of complex things
data.table(counts=cnts, width=ew, cplx=complex)
}, by=symbol]
Is that right? (I mean, it looks like it's working, but maybe there's
a more idiomatic way(?))
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list