[R] aov error with large data set
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Jul 16 20:27:26 CEST 2008
Mike Lawrence wrote:
> I'm looking to analyze a large data set: a within-Ss 2*2*1500 design
> with 20 Ss. However, aov() gives me an error, reproducible as follows:
>
> id = factor(1:20)
> a = factor(1:2)
> b = factor(1:2)
> d = factor(1:1500)
> temp = expand.grid(id=id, a=a, b=b, d=d)
> temp$y = rnorm(length(temp[, 1])) #generate some random DV data
> this_aov = aov(
> y~a*b*d+Error(id/(a*b*d))
> , data=temp
> )
>
> While yields the following error:
> "
> Error in model.matrix.default(mt, mf, contrasts) :
> allocMatrix: too many elements specified
> "
>
> Any suggestions?
>
This is an inherent weakness of aov(), or at least the current
implementation thereof. You end up fitting a set of linear models with a
huge number of parameters, in order to get the separation into strata.
The column dimensions of the design matrices are the number of random
effects, and if you have 60000 of those, you run out of storage. (As
written, you even have 120000=20*2*2*1500 for the id*a*b*d term, but
removing it isn't really going to help.)
(30 years ago, a much more efficient algorithm was implemented in
Genstat, but we seem to be short of volunteers to reimplement it...)
Ideas? Here are three:
lme4 should be able to handle such designs. It won't get the df for the
F tests, but you could work them out by hand.
or, you could try recasting as a multivariate lm problem (see my recent
R News paper). This is still pretty huge, but this time the limiting
quantity is the 6000*6000 empirical covariance matrix, which could be
manageable.
or, the most efficient way, but much more work for you: Generate the
relevant tables of means and residuals; e.g. by placing your date in a
20*2*2*1500 table and using the relevant combinations of apply() and
sweep(). These can be used to generate the relevant sums of squares.
> Mike
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> www.memetic.ca
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
> - Piet Hein
>
"Problems worthy of attack, prove their worth by hitting back" - Piet Hein
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list