[R] for help about logistic regression model
Douglas Bates
bates at stat.wisc.edu
Tue Nov 21 21:45:40 CET 2006
On 11/21/06, Aimin Yan <aiminy at iastate.edu> wrote:
> thanks for your reply, it is very helpful.
> I have one more question.
> Now I try to fit a full mode use 13 predictors, but I get this error
> message. Dose this problem come from too many predictors or too large data set?
> thanks,
>
> Aimin Yan
>
>
> > p5.lgm.9 <- lmer(Y
> ~p*aa*index*x*y*z*sdx*sdy*sdz*delta*as*ms*cur+(1|p/aa),data=p5,family=binomial,control=list(usePQL=FALSE,msV=1))
> Error: cannot allocate vector of size 1565600 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 494Mb: see help(memory.size)
> 2: Reached total allocation of 494Mb: see help(memory.size)
Well, considering that the model you specified would have a 13-factor
interaction and 13 12-factor interactions and 78 11-factor
interactions and ... I think your problem is that you are trying to
estimate far too many fixed effects parameters. There would be a
total of 2^13 terms in the model. I didn't bother to calculate the
total number of coefficients because 2^13 is already greater than the
number of observations.
[Can anyone provide code to calculate the total number of
fixed-effects coefficients? The structure of the data is
> str(p5)
'data.frame': 1030 obs. of 15 variables:
$ p : Factor w/ 5 levels "821p","8ABP",..: 1 1 1 1 1 1 1 1 1 1 ...
$ aa : Factor w/ 19 levels "ALA","ARG","ASN",..: 12 16 7 18 11 10
19 19 19 1 ...
$ index: int 1 2 3 4 5 6 7 8 9 11 ...
$ x : num -5.10 -4.07 -5.87 -1.35 -4.27 ...
$ y : num 32.9 28.7 30.5 26.9 27.8 ...
$ z : num -5.858 -4.838 -0.687 -0.492 6.273 ...
$ sdx : num 1.478 0.598 1.313 1.038 1.206 ...
$ sdy : num 1.74 1.38 2.00 1.37 1.20 ...
$ sdz : num 0.826 1.166 0.896 2.285 1.634 ...
$ delta: num 13.8 13.7 22.8 44.7 53.3 ...
$ as : num 126.9 64.1 82.7 7.6 42.0 ...
$ ms : num 92.4 50.7 75.3 17.2 57.7 ...
$ cur : num -0.1320 -0.0977 -0.0182 0.2368 0.1306 ...
$ sc : num 111.1 98.5 65.1 75.4 91.1 ...
$ Y : logi TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
FALSE FALSE TRUE ...
]
You may want to start with an additive model instead of a model with
all possible interactions. Even better would be to plot the data in
various ways to try to see which of these covariates seems to have a
substantial effect on the probability of p5$Y being TRUE or FALSE.
Remember that when you are working with a binary response you get
exactly 1 bit of information from each observation of the response.
Because that isn't a whole lot of information per observation you need
to have a large number of observations relative to the number of
coefficients that you hope to estimate.
More information about the R-help
mailing list