[R] Collapsing data frame; aggregate() or better function?

Tobin, Jared TobinJR at DFO-MPO.GC.CA
Fri Sep 14 18:34:19 CEST 2007


Thanks for the quick reply Jim.

I haven't had any success when I whittle down 'by' list even further
though.  I believe I'm using the right command, but now it's just a
matter of clear memory issues.

> test <- aggregate(lf1.turbot[,17:217], list(lf1.turbot$vessel,
lf1.turbot$trip, lf1.turbot$set), sum)
Error: cannot allocate vector of size 237.4 Mb In addition: Warning
messages:
1: Reached total allocation of 734Mb: see help(memory.size)
2: Reached total allocation of 734Mb: see help(memory.size)
3: Reached total allocation of 734Mb: see help(memory.size)
4: Reached total allocation of 734Mb: see help(memory.size) 

A fellow kindly emailed me directly and suggested trying Wickham's
'reshape' package, but again when using the melt() function in that
package I run into memory problems.  A colleague suggested I 'create
factors using as.factor() and feed this directly into the appropriate
apply function', but I've had no success with this when using tapply().

Any suggestions as to a less memory-intensive procedure would be greatly
appreciated.

Thanks,

--

jared tobin, student research assistant
fisheries and oceans canada
tobinjr at dfo-mpo.gc.ca

-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com] 
Sent: Thursday, September 13, 2007 6:49 PM
To: Tobin, Jared
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Collapsing data frame; aggregate() or better function?

The second argument for aggregate is supposed to be a list, so try
(notice the missing comma before "1:8"):

test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum)


On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca> wrote:
> Hello r-help,
>
> I am trying to collapse or aggregate 'some' of a data frame.  A very 
> simplified version of my data frame looks like:
>
> > tester
>  trip set num sex lfs1 lfs2
> 1  313  15   5   M    2    3
> 2  313  15   3   F    1    2
> 3  313  17   1   M    0    1
> 4  313  17   2   F    1    1
> 5  313  17   1   U    1    0
>
> And I want to omit sex from the picture and just get an addition of 
> num, lfs1, and lfs2 for each unique trip/set combination.  Using 
> aggregate() works fine here,
>
> > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) test
>  trip set num lfs1 lfs2
> 1  313  15   8    3    5
> 2  313  17   4    2    2
>
> But I'm having trouble getting the same function to work on my actual 
> data frame which is considerably larger.
>
> > dim(lf1.turbot)
> [1] 16468   217
> > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
> sum)
> Error in vector("list", prod(extent)) : vector size specified is too 
> large In addition: Warning messages:
> 1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
> one)
> 2: NAs produced by integer overflow in: group + ngroup *
> (as.integer(index) - one)
> 3: NAs produced by integer overflow in: ngroup * nlevels(index)
>
> I'm guessing that either aggregate() can't handle a data frame of this

> size OR that there is an issue with 'omitting' more than one variable 
> (in the same way I've omitted sex in the above example).  Can anyone 
> clarify and/or recommend any relatively simple alternative procedure 
> to accomplish this?
>
> I plan on trying variants of by() and tapply() tomorrow morning, but 
> I'm about to head home for the day.
>
> Thanks,
>
> --
>
> jared tobin, student research assistant fisheries and oceans canada 
> tobinjr at dfo-mpo.gc.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list