[R] Collinearity? Cannot get logisticRidge{ridge} to work

David Winsemius dwinsemius at comcast.net
Wed May 27 23:57:13 CEST 2015


On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:

> Thank you very much for your rapid response. I sincerely appreciate your input.
> I am sorry for sending the previous email in HTML format.
> 
> with(a,  table(Sex, Therapy1) )   shows the following.
>          Therapy1
> Sex      no yes
>  female  6   7
>  male    7   5
> 
>  and with(a,  table(Therapy1, Outcome) )
> elicit the following
> 
>        Outcome
> Sex      Alive Death
>  female     4     9
>  male       9     3
> 
>        Outcome
> Therapy1 Alive Death
>     no      4     9
>     yes     9     3

Then what about:

with(a,  table(Sex, Therapy1,  Outcome) )

-- 
David


> 
> As there is no zero cells, it does not seem to be complete separation.
> I really appreciate comments.
> 
> Kengo Inagaki
> Memphis, TN
> 
> 
> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>> 
>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>> 
>>> I am currently working on a health care related project using R. I am
>>> learning R while working on data analysis.
>>> 
>>> Below is the part of the data in which i am encountering a problem.
>>> 
>>> 
>>> Case#    Sex         Therapy1             Therapy2             Outcome
>>> 
>>> 1              male      no
>>> no                           Alive
>>> 
>> 
>> snipped mangled data sent in HTML
>> 
>>> 
>>> 
>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>> predictor variables.
>>> 
>>> All of the predictors are significantly associated with the outcome by
>>> univariate analysis.
>>> 
>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>> "Therapy1" are not included at the same time (This is a part of table that
>>> I cut out from a larger table for ease of
>>> 
>>> presentation and there are more predictors that i tested).
>> 
>> Please examine the data before reaching for ridge regression:
>> 
>> What does this show: ...
>> 
>>    with(a,  table(Sex, Therapy1) )
>> 
>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>> 
>> --
>> David.
>>> 
>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>> model at the same time, standard error inflates and p value gets close to 1.
>>> 
>>> The formula used is,
>>> 
>>> 
>>> 
>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>> vector "a" to represent above table.
>>> 
>>> 
>>> 
>>> After doing some reading, I suspect this might be collinearity, as vif
>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>> both "Sex" and "Therapy1").
>>> 
>>> Learning that ridge regression may be a solution, I attempted using
>>> logisticRidge {ridge} using the following formula, but i get the
>>> accomapnying error message.
>>> 
>>> 
>>> 
>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>> 
>>> 
>>> 
>>> Error in ifelse(y, log(p), log(1 - p)) :
>>> 
>>> invalid to change the storage mode of a factor
>>> 
>>> 
>>> 
>>> At this point I do not have an idea how to solve this and would like to
>>> seek help.
>>> 
>>> I really really appreciate your input!!!
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list