[R] Coefficients of Logistic Regression from bootstrap - how to get them?
Doran, Harold
HDoran at air.org
Tue Jul 22 16:46:02 CEST 2008
Probably a good idea for you. The R help list is useful for both
programming AND statistical advice for those who want it.
> -----Original Message-----
> From: Michal Figurski [mailto:figurski at mail.med.upenn.edu]
> Sent: Tuesday, July 22, 2008 10:44 AM
> To: Doran, Harold; r-help at r-project.org
> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap - how to get them?
>
> Hmm...
>
> It sounds like ideology to me. I was asking for technical
> help. I know what I want to do, just don't know how to do it
> in R. I'll go back to SAS then. Thank you.
>
> --
> Michal J. Figurski
>
> Doran, Harold wrote:
> > I think the answer has been given to you. If you want to
> continue to
> > ignore that advice and use bootstrap for point estimates
> rather than
> > the properties of those estimates (which is what bootstrap is for)
> > then you are on your own.
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
> >> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
> >> Sent: Tuesday, July 22, 2008 9:52 AM
> >> To: r-help at r-project.org
> >> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap -
> >> how to get them?
> >>
> >> Dear all,
> >>
> >> I don't want to argue with anybody about words or about what
> >> bootstrap is suitable for - I know too little for that.
> >>
> >> All I need is help to get the *equation coefficients* optimized by
> >> bootstrap - either by one of the functions or by simple median.
> >>
> >> Please help,
> >>
> >> --
> >> Michal J. Figurski
> >> HUP, Pathology & Laboratory Medicine
> >> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
> >> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
> >>
> >> Frank E Harrell Jr wrote:
> >>> Michal Figurski wrote:
> >>>> Frank,
> >>>>
> >>>> "How does bootstrap improve on that?"
> >>>>
> >>>> I don't know, but I have an idea. Since the data in my set
> >> are just a
> >>>> small sample of a big population, then if I use my whole
> >> dataset to
> >>>> obtain max likelihood estimates, these estimates may be
> >> best for this
> >>>> dataset, but far from ideal for the whole population.
> >>> The bootstrap, being a resampling procedure from your
> >> sample, has the
> >>> same issues about the population as MLEs.
> >>>
> >>>> I used bootstrap to virtually increase the size of my
> dataset, it
> >>>> should result in estimates more close to that from the
> >> population -
> >>>> isn't it the purpose of bootstrap?
> >>> No
> >>>
> >>>> When I use such median coefficients on another dataset (another
> >>>> sample from population), the predictions are better, than
> >> using max
> >>>> likelihood estimates. I have already tested that and it worked!
> >>> Then your testing procedure is probably not valid.
> >>>
> >>>> I am not a statistician and I don't feel what
> >> "overfitting" is, but
> >>>> it may be just another word for the same idea.
> >>>>
> >>>> Nevertheless, I would still like to know how can I get the
> >>>> coeffcients for the model that gives the "nearly unbiased
> >> estimates".
> >>>> I greatly appreciate your help.
> >>> More info in my book Regression Modeling Strategies.
> >>>
> >>> Frank
> >>>
> >>>> --
> >>>> Michal J. Figurski
> >>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
> >>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA
> >>>> 19104 tel. (215) 662-3413
> >>>>
> >>>> Frank E Harrell Jr wrote:
> >>>>> Michal Figurski wrote:
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I am trying to optimize my logistic regression model by using
> >>>>>> bootstrap. I was previously using SAS for this kind of
> >> tasks, but I
> >>>>>> am now switching to R.
> >>>>>>
> >>>>>> My data frame consists of 5 columns and has 109 rows.
> >> Each row is a
> >>>>>> single record composed of the following values: Subject_name,
> >>>>>> numeric1, numeric2, numeric3 and outcome (yes or no).
> All three
> >>>>>> numerics are used to predict outcome using LR.
> >>>>>>
> >>>>>> In SAS I have written a macro, that was splitting the dataset,
> >>>>>> running LR on one half of data and making predictions
> on second
> >>>>>> half. Then it was collecting the equation coefficients
> from each
> >>>>>> iteration of bootstrap. Later I was just taking
> medians of these
> >>>>>> coefficients from all iterations, and used them as an
> >> optimal model
> >>>>>> - it really worked well!
> >>>>> Why not use maximum likelihood estimation, i.e., the
> coefficients
> >>>>> from the original fit. How does the bootstrap improve on that?
> >>>>>
> >>>>>> Now I want to do the same in R. I tried to use the
> 'validate' or
> >>>>>> 'calibrate' functions from package "Design", and I also
> >>>>>> experimented with function 'sm.binomial.bootstrap'
> from package
> >>>>>> "sm". I tried also the function 'boot' from package
> >> "boot", though
> >>>>>> without success
> >>>>>> - in my case it randomly selected _columns_ from my
> data frame,
> >>>>>> while I wanted it to select _rows_.
> >>>>> validate and calibrate in Design do resampling on the rows
> >>>>>
> >>>>> Resampling is mainly used to get a nearly unbiased
> >> estimate of the
> >>>>> model performance, i.e., to correct for overfitting.
> >>>>>
> >>>>> Frank Harrell
> >>>>>
> >>>>>> Though the main point here is the optimized LR
> equation. I would
> >>>>>> appreciate any help on how to extract the LR equation
> >> coefficients
> >>>>>> from any of these bootstrap functions, in the same form
> >> as given by
> >>>>>> 'glm' or 'lrm'.
> >>>>>>
> >>>>>> Many thanks in advance!
> >>>>>>
> >>>>>
> >>>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
More information about the R-help
mailing list