[R] Collapsing data frame; aggregate() or better function?
Tobin, Jared
TobinJR at DFO-MPO.GC.CA
Thu Sep 13 22:20:07 CEST 2007
Hello r-help,
I am trying to collapse or aggregate 'some' of a data frame. A very
simplified version of my data frame looks like:
> tester
trip set num sex lfs1 lfs2
1 313 15 5 M 2 3
2 313 15 3 F 1 2
3 313 17 1 M 0 1
4 313 17 2 F 1 1
5 313 17 1 U 1 0
And I want to omit sex from the picture and just get an addition of num,
lfs1, and lfs2 for each unique trip/set combination. Using aggregate()
works fine here,
> test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> test
trip set num lfs1 lfs2
1 313 15 8 3 5
2 313 17 4 2 2
But I'm having trouble getting the same function to work on my actual
data frame which is considerably larger.
> dim(lf1.turbot)
[1] 16468 217
> test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
sum)
Error in vector("list", prod(extent)) : vector size specified is too
large
In addition: Warning messages:
1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
one)
2: NAs produced by integer overflow in: group + ngroup *
(as.integer(index) - one)
3: NAs produced by integer overflow in: ngroup * nlevels(index)
I'm guessing that either aggregate() can't handle a data frame of this
size OR that there is an issue with 'omitting' more than one variable
(in the same way I've omitted sex in the above example). Can anyone
clarify and/or recommend any relatively simple alternative procedure to
accomplish this?
I plan on trying variants of by() and tapply() tomorrow morning, but I'm
about to head home for the day.
Thanks,
--
jared tobin, student research assistant
fisheries and oceans canada
tobinjr at dfo-mpo.gc.ca
More information about the R-help
mailing list