[R] absurd computiation times of lme

Christof Meigen christof at nicht-ich.de
Tue Oct 15 16:16:45 CEST 2002


thanks a lot for all your hints. Alas, some problems remain

Douglas Bates <bates at stat.wisc.edu> writes:
> But you are also implicitly estimating the random effects for each
> child.  These are sometimes regarded as 'nuisance' parameters but they
> still need to be estimated, at least implicitly.  In this case there
> would be about 6000 of them (1000 children by 6 random effects per
> child).

I'm aware of that, and would not dare to estimate these parameters
independently per child, since I would be overfitting the data.

But I thought one could use lme to constain this flexibility by
using information derived from the rest of the population. If this would
only mean subtracting the mean curve, I woulnd't need lme, would I:

> There is a big difference when fitting random effects between adding
> parameters in the fixed effects, which are estimated from all the
> data, and adding parameters in the random effects, which are estimated
> from the data for one subject.

Does this really mean that the estimates for the random effects are
totally independent from the rest of the data? So, if my random
effect is flexible enough to model more ore less any curve,
this "any" curve will be fitted to the data no matter how unlikely
it is (looking at the rest of the population) and how little
data is availiable on this subject?

The point is that the inclusion criterium for the children is that
they have _at least_ measurments in each quarter, but some have
measurements every month or so. I thought lme would be a good 
way to deal with this difference in the amount of information available.

> I would recommend that you start with a spline model for the fixed
> effects but use either a simple additive shift for the random effects
> (random = ~1|Subject) or an additive shift and a shift in the time
> trend (random = ~ age | Subject).  You simply don't have enough data
> to estimate 6 parameters from the data for each child.

Bad enough, this is for an PhD (luckily not mine) about growth
velocity. The medical Prof sees no problem, saying: when you
have two measurements you have a growth velocity for the timepoint
right between these measurements. 

I think this is a bad approach and suggested to smooth the curves
before. The approach of using (random = ~ age | Subject) or,
as seen from looking at the log's, better (random = ~ age^0.15 | Subject),
works as expected, but gives fits which are sometimes as far as
3 cm from the real measurements (while the measurement error
is assumed to be about 0.5cm). These unusual decelerations are
exactly what the wannabe-PhD is interested in.

Finally, the argument with the "too little data" does not apply
to the set-up with with Berkeley Boys, with each one 31 measurements,
where a 7-parameters spline basis random effect wouldn't converge
within several hours.

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list