[R] Collapsing data frame; aggregate() or better function?

Tobin, Jared TobinJR at DFO-MPO.GC.CA
Thu Sep 13 22:20:07 CEST 2007


Hello r-help,

I am trying to collapse or aggregate 'some' of a data frame.  A very
simplified version of my data frame looks like:

> tester
  trip set num sex lfs1 lfs2
1  313  15   5   M    2    3
2  313  15   3   F    1    2
3  313  17   1   M    0    1
4  313  17   2   F    1    1
5  313  17   1   U    1    0

And I want to omit sex from the picture and just get an addition of num,
lfs1, and lfs2 for each unique trip/set combination.  Using aggregate()
works fine here,

> test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> test
  trip set num lfs1 lfs2
1  313  15   8    3    5
2  313  17   4    2    2 

But I'm having trouble getting the same function to work on my actual
data frame which is considerably larger.

> dim(lf1.turbot)
[1] 16468   217
> test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
sum)
Error in vector("list", prod(extent)) : vector size specified is too
large
In addition: Warning messages:
1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
one) 
2: NAs produced by integer overflow in: group + ngroup *
(as.integer(index) - one) 
3: NAs produced by integer overflow in: ngroup * nlevels(index) 

I'm guessing that either aggregate() can't handle a data frame of this
size OR that there is an issue with 'omitting' more than one variable
(in the same way I've omitted sex in the above example).  Can anyone
clarify and/or recommend any relatively simple alternative procedure to
accomplish this?

I plan on trying variants of by() and tapply() tomorrow morning, but I'm
about to head home for the day.

Thanks,

--

jared tobin, student research assistant
fisheries and oceans canada
tobinjr at dfo-mpo.gc.ca



More information about the R-help mailing list