[R] Collapsing data frame; aggregate() or better function?
Tobin, Jared
TobinJR at DFO-MPO.GC.CA
Fri Sep 14 18:34:19 CEST 2007
Thanks for the quick reply Jim.
I haven't had any success when I whittle down 'by' list even further
though. I believe I'm using the right command, but now it's just a
matter of clear memory issues.
> test <- aggregate(lf1.turbot[,17:217], list(lf1.turbot$vessel,
lf1.turbot$trip, lf1.turbot$set), sum)
Error: cannot allocate vector of size 237.4 Mb In addition: Warning
messages:
1: Reached total allocation of 734Mb: see help(memory.size)
2: Reached total allocation of 734Mb: see help(memory.size)
3: Reached total allocation of 734Mb: see help(memory.size)
4: Reached total allocation of 734Mb: see help(memory.size)
A fellow kindly emailed me directly and suggested trying Wickham's
'reshape' package, but again when using the melt() function in that
package I run into memory problems. A colleague suggested I 'create
factors using as.factor() and feed this directly into the appropriate
apply function', but I've had no success with this when using tapply().
Any suggestions as to a less memory-intensive procedure would be greatly
appreciated.
Thanks,
--
jared tobin, student research assistant
fisheries and oceans canada
tobinjr at dfo-mpo.gc.ca
-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com]
Sent: Thursday, September 13, 2007 6:49 PM
To: Tobin, Jared
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Collapsing data frame; aggregate() or better function?
The second argument for aggregate is supposed to be a list, so try
(notice the missing comma before "1:8"):
test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum)
On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca> wrote:
> Hello r-help,
>
> I am trying to collapse or aggregate 'some' of a data frame. A very
> simplified version of my data frame looks like:
>
> > tester
> trip set num sex lfs1 lfs2
> 1 313 15 5 M 2 3
> 2 313 15 3 F 1 2
> 3 313 17 1 M 0 1
> 4 313 17 2 F 1 1
> 5 313 17 1 U 1 0
>
> And I want to omit sex from the picture and just get an addition of
> num, lfs1, and lfs2 for each unique trip/set combination. Using
> aggregate() works fine here,
>
> > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) test
> trip set num lfs1 lfs2
> 1 313 15 8 3 5
> 2 313 17 4 2 2
>
> But I'm having trouble getting the same function to work on my actual
> data frame which is considerably larger.
>
> > dim(lf1.turbot)
> [1] 16468 217
> > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
> sum)
> Error in vector("list", prod(extent)) : vector size specified is too
> large In addition: Warning messages:
> 1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
> one)
> 2: NAs produced by integer overflow in: group + ngroup *
> (as.integer(index) - one)
> 3: NAs produced by integer overflow in: ngroup * nlevels(index)
>
> I'm guessing that either aggregate() can't handle a data frame of this
> size OR that there is an issue with 'omitting' more than one variable
> (in the same way I've omitted sex in the above example). Can anyone
> clarify and/or recommend any relatively simple alternative procedure
> to accomplish this?
>
> I plan on trying variants of by() and tapply() tomorrow morning, but
> I'm about to head home for the day.
>
> Thanks,
>
> --
>
> jared tobin, student research assistant fisheries and oceans canada
> tobinjr at dfo-mpo.gc.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list