[R] BMA, logistic regression, odds ratio, model reduction etc

Thu Apr 21 18:46:12 CEST 2011

Thank you for your comment.
I forgot to mention that varclus and pvclust showed similar results for 
my data.

BTW, I did not realize rms is a replacement for the Design package.
I appreciate your suggestion.
--
KH

(11/04/21 8:00), Frank Harrell wrote:
> I think it's OK.  You can also use the Hmisc package's varclus function.
> Frank
>
>
> 細田弘吉 wrote:
>>
>> Dear Prof. Harrel,
>>
>> Thank you very much for your quick advice.
>> I will try rms package.
>>
>> Regarding model reduction, is my model 2 method (clustering and recoding
>> that are blinded to the outcome) permissible?
>>
>> Sincerely,
>>
>> --
>> KH
>>
>> (11/04/20 22:01), Frank Harrell wrote:
>>> Deleting variables is a bad idea unless you make that a formal part of
>>> the
>>> BMA so that the attempt to delete variables is penalized for.  Instead of
>>> BMA I recommend simple penalized maximum likelihood estimation (see the
>>> lrm
>>> function in the rms package) or pre-modeling data reduction that is
>>> blinded
>>> to the outcome variable.
>>> Frank
>>>
>>>
>>> 細田弘吉 wrote:
>>>>
>>>> Hi everybody,
>>>> I apologize for long mail in advance.
>>>>
>>>> I have data of 104 patients, which consists of 15 explanatory variables
>>>> and one binary outcome (poor/good). The outcome consists of 25 poor
>>>> results and 79 good results. I tried to analyze the data with logistic
>>>> regression. However, the 15 variables and 25 events means events per
>>>> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R
>>>> package, "BMA" to perform logistic regression with BMA to avoid this
>>>> problem.
>>>>
>>>> model 1 (full model):
>>>> x1, x2, x3, x4 are continuous variables and others are binary data.
>>>>
>>>>> x16.bic.glm<- bic.glm(outcome ~ ., data=x16.df,
>>>> glm.family="binomial", OR20, strict=FALSE)
>>>>> summary(x16.bic.glm)
>>>> (The output below has been cut off at the right edge to save space)
>>>>
>>>>     62  models were selected
>>>>    Best  5  models (cumulative posterior probability =  0.3606 ):
>>>>
>>>>                            p!=0    EV         SD        model 1    model2
>>>> Intercept                100    -5.1348545  1.652424    -4.4688  -5.15
>>>> -5.1536
>>>> age                        3.3   0.0001634  0.007258      .
>>>> sex                        4.0
>>>>      .M                           -0.0243145  0.220314      .
>>>> side                      10.8
>>>>       .R                           0.0811227  0.301233      .
>>>> procedure                 46.9  -0.5356894  0.685148      .      -1.163
>>>> symptom                    3.8  -0.0099438  0.129690      .          .
>>>> stenosis                   3.4  -0.0003343  0.005254      .
>>>> x1                        3.7  -0.0061451  0.144084      .
>>>> x2                       100.0   3.1707661  0.892034     3.2221     3.11
>>>> x3                        51.3  -0.4577885  0.551466    -0.9154     .
>>>> HT                         4.6
>>>>     .positive                      0.0199299  0.161769      .          .
>>>> DM                         3.3
>>>>     .positive                     -0.0019986  0.105910      .          .
>>>> IHD                        3.5
>>>>      .positive                     0.0077626  0.122593      .          .
>>>> smoking                    9.1
>>>>          .positive                 0.0611779  0.258402      .          .
>>>> hyperlipidemia            16.0
>>>>                 .positive          0.1784293  0.512058      .          .
>>>> x4                         8.2   0.0607398  0.267501      .          .
>>>>
>>>>
>>>> nVar                                                       2          2
>>>>            1          3          3
>>>> BIC                                                   -376.9082
>>>> -376.5588  -376.3094  -375.8468  -374.5582
>>>> post prob                                                0.104
>>>> 0.087      0.077      0.061      0.032
>>>>
>>>> [Question 1]
>>>> Is it O.K to calculate odds ratio and its 95% confidence interval from
>>>> "EV" (posterior distribution mean) and“SD”(posterior distribution
>>>> standard deviation)?
>>>> For example, 95%CI of EV of x2 can be calculated as;
>>>>> exp(3.1707661)
>>>> [1] 23.82573     ----->   odds ratio
>>>>> exp(3.1707661+1.96*0.892034)
>>>> [1] 136.8866
>>>>> exp(3.1707661-1.96*0.892034)
>>>> [1] 4.146976
>>>> ------------------>   95%CI (4.1 to 136.9)
>>>> Is this O.K.?
>>>>
>>>> [Question 2]
>>>> Is it permissible to delete variables with small value of "p!=0" and
>>>> "EV", such as age (3.3% and 0.0001634) to reduce the number of
>>>> explanatory variables and reconstruct new model without those variables
>>>> for new session of BMA?
>>>>
>>>> model 2 (reduced model):
>>>> I used R package, "pvclust", to reduce the model. The result suggested
>>>> x1, x2 and x4 belonged to the same cluster, so I picked up only x2.
>>>> Based on the subject knowledge, I made a simple unweighted sum, by
>>>> counting the number of clinical features. For 9 features (sex, side,
>>>> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges
>>>> from 0 to 9. This score was defined as ClinicalScore. Consequently, I
>>>> made up new data set (x6.df), which consists of 5 variables (stenosis,
>>>> x2, x3, procedure, and ClinicalScore) and one binary outcome
>>>> (poor/good). Then, for alternative BMA session...
>>>>
>>>>> BMAx6.glm<- bic.glm(postopDWI_HI ~ ., data=x6.df,
>>>> glm.family="binomial", OR=20, strict=FALSE)
>>>>> summary(BMAx6.glm)
>>>> (The output below has been cut off at the right edge to save space)
>>>> Call:
>>>> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family =
>>>> "binomial",     strict = FALSE, OR = 20)
>>>>
>>>>
>>>>     13  models were selected
>>>>    Best  5  models (cumulative posterior probability =  0.7626 ):
>>>>
>>>>                   p!=0    EV         SD       model 1    model 2
>>>> Intercept       100    -5.6918362  1.81220    -4.4688    -6.3166
>>>> stenosis          8.1  -0.0008417  0.00815      .          .
>>>> x2              100.0   3.0606165  0.87765     3.2221     3.1154
>>>> x3               46.5  -0.3998864  0.52688    -0.9154      .
>>>> procedure       49.3   0.5747013  0.70164      .         1.1631
>>>> ClinicalScore   27.1   0.0966633  0.19645      .          .
>>>>
>>>>
>>>> nVar                                             2          2          1
>>>>            3          3
>>>> BIC                                         -376.9082  -376.5588
>>>> -376.3094  -375.8468  -375.5025
>>>> post prob                                      0.208      0.175
>>>> 0.154      0.122      0.103
>>>>
>>>> [Question 3]
>>>> Am I doing it correctly or not?
>>>> I mean this kind of model reduction is permissible for BMA?
>>>>
>>>> [Question 4]
>>>> I still have 5 variables, which violates the rule of thumb, "EPV>   10".
>>>> Is it permissible to delete "stenosis" variable because of small value
>>>> of "EV"? Or is it O.K. because this is BMA?
>>>>
>>>> Sorry for long post.
>>>>
>>>> I appreciate your help very much in advance.
>>>>
>>>> --
>>>> KH
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> -----
>>> Frank Harrell
>>> Department of Biostatistics, Vanderbilt University
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3464392.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.