[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Jul 22 23:32:56 CEST 2008

Michal Figurski wrote:
> Dear Marc and all,
> Thank you for all the due respect.
> I tried to explain as much explicitly as I could what I am trying to do 
> in my first email. I did not invent this procedure, it was already 
> published in the paper:
> T. Pawinski, M. Hale, M. Korecka, W.E. Fitzsimmons, L.M. Shaw. Limited 
> Sampling Strategy for the Estimation of Mycophenolic Acid Area under the 
> Curve in Adult Renal Transplant Patients Treated with Concomitant 
> Tacrolimus. Clinical Chemistry 2002(48:9), 1497-1504

If you send me a pdf of this paper I will be glad to take a look.

Rather than an ad hoc bootstrap procedure you might look at the 
resistent/robust fit literature and use an objective function that 
spells out what is being optimized.

There probably are cases where taking the median of a set of bootstrap 
regression coefficient estimates works well in a certain sense, but I 
would put my money on penalized maximum likelihood estimation.

As Marc said, your attitude towards free advice is puzzling.


> I only adopted this methodology to work under SAS and now I try to do it 
> under R, because I like R. I need a practical advice because I have a 
> practical problem, and I do not understand much of the theoretical 
> discussion on what bootstrap is suitable for or not. Apparently I am 
> trying to use it for something else than the experts are used to...
> Honestly, I did not learn anything from this discussion so far, I am 
> just disappointed.
> Though, since the discussion has already started, I'd welcome your 
> criticism on this procedure - I just ask that you express it in human 
> language.
> -- 
> Michal J. Figurski
> Marc Schwartz wrote:
>> Michal,
>> With all due respect, you have openly acknowledged that you don't know 
>> enough about the subject at hand.
>> If that is the case, on what basis are you in a position to challenge 
>> the collective wisdom of those professionals who have voluntarily 
>> offered *expert* level statistical advice to you?
>> You have erected a wall around your thinking.
>> You may choose to use R or any other software application to 
>> "Git-R-Done". But that does not make it correct.
>> There are other methods to consider that could be used during the 
>> model building process itself, rather than on a post-hoc basis and I 
>> would specifically refer you to Frank's book, Regression Modeling 
>> Strategies:
>>   http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
>> Marc Schwartz
>> on 07/22/2008 09:43 AM Michal Figurski wrote:
>>> Hmm...
>>> It sounds like ideology to me. I was asking for technical help. I 
>>> know what I want to do, just don't know how to do it in R. I'll go 
>>> back to SAS then. Thank you.
>>> -- 
>>> Michal J. Figurski
>>> Doran, Harold wrote:
>>>> I think the answer has been given to you. If you want to continue to
>>>> ignore that advice and use bootstrap for point estimates rather than 
>>>> the
>>>> properties of those estimates (which is what bootstrap is for) then you
>>>> are on your own.
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org 
>>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
>>>>> Sent: Tuesday, July 22, 2008 9:52 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap 
>>>>> - how to get them?
>>>>> Dear all,
>>>>> I don't want to argue with anybody about words or about what 
>>>>> bootstrap is suitable for - I know too little for that.
>>>>> All I need is help to get the *equation coefficients* optimized by 
>>>>> bootstrap - either by one of the functions or by simple median.
>>>>> Please help,
>>>>> -- 
>>>>> Michal J. Figurski
>>>>> HUP, Pathology & Laboratory Medicine
>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>> Frank E Harrell Jr wrote:
>>>>>> Michal Figurski wrote:
>>>>>>> Frank,
>>>>>>> "How does bootstrap improve on that?"
>>>>>>> I don't know, but I have an idea. Since the data in my set 
>>>>> are just a
>>>>>>> small sample of a big population, then if I use my whole 
>>>>> dataset to
>>>>>>> obtain max likelihood estimates, these estimates may be 
>>>>> best for this
>>>>>>> dataset, but far from ideal for the whole population.
>>>>>> The bootstrap, being a resampling procedure from your 
>>>>> sample, has the
>>>>>> same issues about the population as MLEs.
>>>>>>> I used bootstrap to virtually increase the size of my dataset, it 
>>>>>>> should result in estimates more close to that from the 
>>>>> population -
>>>>>>> isn't it the purpose of bootstrap?
>>>>>> No
>>>>>>> When I use such median coefficients on another dataset (another 
>>>>>>> sample from population), the predictions are better, than 
>>>>> using max
>>>>>>> likelihood estimates. I have already tested that and it worked!
>>>>>> Then your testing procedure is probably not valid.
>>>>>>> I am not a statistician and I don't feel what 
>>>>> "overfitting" is, but
>>>>>>> it may be just another word for the same idea.
>>>>>>> Nevertheless, I would still like to know how can I get the 
>>>>>>> coeffcients for the model that gives the "nearly unbiased 
>>>>> estimates".
>>>>>>> I greatly appreciate your help.
>>>>>> More info in my book Regression Modeling Strategies.
>>>>>> Frank
>>>>>>> -- 
>>>>>>> Michal J. Figurski
>>>>>>> HUP, Pathology & Laboratory Medicine
>>>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
>>>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>> Michal Figurski wrote:
>>>>>>>>> Hello all,
>>>>>>>>> I am trying to optimize my logistic regression model by using 
>>>>>>>>> bootstrap. I was previously using SAS for this kind of 
>>>>> tasks, but I
>>>>>>>>> am now switching to R.
>>>>>>>>> My data frame consists of 5 columns and has 109 rows. 
>>>>> Each row is a
>>>>>>>>> single record composed of the following values: Subject_name, 
>>>>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three 
>>>>>>>>> numerics are used to predict outcome using LR.
>>>>>>>>> In SAS I have written a macro, that was splitting the dataset, 
>>>>>>>>> running LR on one half of data and making predictions on second 
>>>>>>>>> half. Then it was collecting the equation coefficients from 
>>>>>>>>> each iteration of bootstrap. Later I was just taking medians of 
>>>>>>>>> these coefficients from all iterations, and used them as an 
>>>>> optimal model
>>>>>>>>> - it really worked well!
>>>>>>>> Why not use maximum likelihood estimation, i.e., the 
>>>>>>>> coefficients from the original fit.  How does the bootstrap 
>>>>>>>> improve on that?
>>>>>>>>> Now I want to do the same in R. I tried to use the 'validate' 
>>>>>>>>> or 'calibrate' functions from package "Design", and I also 
>>>>>>>>> experimented with function 'sm.binomial.bootstrap' from package 
>>>>>>>>> "sm". I tried also the function 'boot' from package 
>>>>> "boot", though
>>>>>>>>> without success
>>>>>>>>> - in my case it randomly selected _columns_ from my data frame, 
>>>>>>>>> while I wanted it to select _rows_.
>>>>>>>> validate and calibrate in Design do resampling on the rows
>>>>>>>> Resampling is mainly used to get a nearly unbiased 
>>>>> estimate of the
>>>>>>>> model performance, i.e., to correct for overfitting.
>>>>>>>> Frank Harrell
>>>>>>>>> Though the main point here is the optimized LR equation. I 
>>>>>>>>> would appreciate any help on how to extract the LR equation 
>>>>> coefficients
>>>>>>>>> from any of these bootstrap functions, in the same form 
>>>>> as given by
>>>>>>>>> 'glm' or 'lrm'.
>>>>>>>>> Many thanks in advance!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list