[R] another optimization question

Mon Nov 26 00:20:21 CET 2001

Dear Brian

At 08:26 AM 11/25/2001 +0000, Prof Brian D Ripley wrote:
>On Sat, 24 Nov 2001, John Fox wrote:
>
>. . .
>
> > So, my question is, is it possible in principle for an optimization to fail
> > using a correct analytic gradient but to converge with a numerical
> > gradient? If this is possible, is it a common occurrence?
>
>It's possible but rare.  You don't have a `correct analytic gradient', but
>a numerical computation of it.  Inaccurately computed gradients are a
>common cause of convergence problems. You may need to adjust the
>tolerances.
>
>It's also possible in principle that the optimizer takes a completely
>different path from the starting point due to small differences in
>calculated derivatives.  It's worth trying staritng near the expected
>answer to rule this out.

I didn't describe it in my original post, but I had messed around a fair 
bit with the problem before posting my question. (I say "messed around" 
because I'm far from expert in numerical optimization.) Since reading your 
response, I've checked over some of what I did, to make sure I that I 
remember it correctly. Without describing everything in tedious detail, 
here are some of my results:

First, if I start the optimization right at the solution, I get the 
solution back as a result. I took this as evidence that my calculation of 
the gradient is probably ok. (And, as I said, I get the correct solution to 
other problems.)

If I start reasonably near the solution, optim (which I use first) reports 
convergence, but doesn't quite reach the solution; nlm (which starts with 
the parameter values produced by optim) reports a return code of 3, which 
corresponds to "last global step failed to locate a point lower than 
estimate. Either estimate is an approximate local minimum of the function 
or steptol is too small." Changing the steptol (and other arguments to nlm) 
doesn't seem to help, however. (I do have a question about the fscale and 
typsize arguments, which default respectively to 1 and a vector of 1's: Why 
are these available independent of the start values, from which they can be 
inferred?)

So that you can get a more concrete sense of what's going on, here's a 
table of the different solutions (with the rows corresponding to parameters):

        optim        nlm(1)      nlm(2)    start(1)  start(2)
lamb   4.9017942   4.9250519   5.3688830   5.0    18.6204419
gam1  -0.5382103  -0.5912515  -0.6299493  -1.0    -0.4252919
beta   0.6141337   0.6046611   0.5931075   1.0     0.5207707
gam2  -0.1992669  -0.2189711  -0.2408609  -0.2    -0.1805793
the1   3.8618249   3.5585071   3.6077990   4.0     2.0121677
the2   4.3781542   3.6819272   3.5949141   4.0     1.5921868
the3   1.6595465   2.4249510   2.9937057   3.0     2.2103193
the4 299.7290498 299.6466428 259.5756196 300.0   103.5671443
the5   1.2506907   0.8819633   0.9057823   1.0     0.5174292
psi1   5.9471507   5.8307768   5.6705004  61.0     0.8191268
psi2   4.6063684   4.5328785   4.5149762   5.0     0.6161997
phi    9.3785360   7.1702049   6.6162702   7.0     1.0000000
-------------------------------------------------------------
obj fn 55.74172    18.41530    13.48505

Here, the solutions labelled optim and nlm(1) use the supplied expression 
for the gradient (your point that this too is a numerical approximation 
seems obvious once stated, but I didn't consider it previously), while the 
solution labelled nlm(2) uses the default numerical derivatives; the 
start(1) column gives the start values that I specified "near" to the 
solution nlm(2); the start(2) column gives the start values that the 
program calculates itself if start values are not supplied; and the last 
row gives the values of the objective function for each solution, scaled as 
a chi-square statistic with 9 df. (When the start values in start(2) are 
used, the solutions produced by optim and nlm(1) are different from those 
given above, but the symptoms are the same -- e.g., optim reports 
convergence, nlm returns a code of 3.)

I suspect that the problem is ill-conditioned in some way, but I haven't 
been able to figure out how. I guess that I should investigate further. I 
could supply other potentially relevant information, such as the hessian at 
the solution, but I'm reluctant to impose further on your time, or that of 
other list members.

Thanks for your help,
  John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._