[R] Error when running Conditional Logit Model
    David Winsemius 
    dwinsemius at comcast.net
       
    Sat Dec 19 05:02:54 CET 2009
    
    
  
On Dec 18, 2009, at 7:39 PM, Hien Nguyen wrote:
> Thanks a lot for answering my questions.
>
> I have tried to run the clogit for only 64 observations and 4  
> independent variables and the results are solved instantly. However,  
> when I run the same command (with only 4 dependent variables) for  
> the full data, it keeps running for 50 minutes now. :(
>
> Thomas, what do you mean by "maximizing the unconditional likelihood  
> is fine when the stratum sizes are large"? What I put in "strata  
> (__)" is actually the possible choices (1-64). Each choices will be  
> recored more than 4000 times (which means I have more than 4000  
> values of 1, 4000 values of 2 and so on).
> Does it sound right?
I'm pretty sure he means glm( formula, family="binomial", ...)  and  
skip the strata specification.
-- 
David.
>
> Thanks a lot
>
> Hien
>
> tlumley at u.washington.edu wrote:
>> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>>
>>> Dear Drs Winsemius and Berry,
>>>
>>> Thanks a lot for your comment and suggestions on running my model.  
>>> I am not just new to R but new to CLM as well. :( With your  
>>> suggestions, I figure out that I have huge misunderstandings on  
>>> the model and data arrangement.
>>>
>>> After my finals, I have read again related materials on CLM and  
>>> rearranged in an appropriate way before running the model in R.  
>>> This time, I have a data of more than 250,000 observations  
>>> (created from more than 4000 response) and a model of 15 predictors.
>>>
>>> My question is that how long should it takes for the clogit  
>>> command to run because it has been running for more 10 hours on a  
>>> quad-core computer and still doesn't show any sign of done or  
>>> almost done. Is it OK or my command just does not work.
>>
>> If you have a lot of records with case=1 in a stratum, conditional  
>> logistic regression will be extremely slow.   And unnecessary:  
>> maximizing the unconditional likelihood is fine when the stratum  
>> sizes are large.
>>
>> Note that a quad-core computer won't help. Only one core will be  
>> used in the computations.
>>
>>     -thomas
>>
>>
>>
>>
>>> Thanks a lot for your response
>>>
>>> Hien
>>>
>>>
>>> Charles C. Berry wrote:
>>>> On Fri, 4 Dec 2009, David Winsemius wrote:
>>>>
>>>>>
>>>>> On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>>>>
>>>>>> Dear Dr. Winsemius,
>>>>>>
>>>>>> Thank you very much for your reply.
>>>>>>
>>>>>> I have tried many possible combinations (even with the model of  
>>>>>> only 2 predictors) but it produces the same message. With more  
>>>>>> than 4000 observations, I think 14 predictors might not be too  
>>>>>> many.
>>>>>
>>>>> It is what happens in the factor combinations that concern me. I  
>>>>> am guessing that some of those predictors are factors. You  
>>>>> really should not ask r-help questions without providing better  
>>>>> descriptions of both the outcomes and the predictor variables.
>>>>>
>>>>>>
>>>>>> Although my dependent variable (Pin) is not discrete  (it  
>>>>>> ranges from 0 to 1), I do not think it will create problems to  
>>>>>> the estimation but I'm not sure
>>>>>
>>>>> I would think it _would_ cause problems. As I understand it,  
>>>>> conditional methods create contingency tables. Why are you using  
>>>>> an outcome type that is not consistent with the fundamental  
>>>>> regression assumptions of the clogit function?
>>>>>
>>>>> I do not get that particular error when I munge the infert  
>>>>> dataset to have case be a random uniform value, but I do get an  
>>>>> error.
>>>>>> infert$case <- runif(nrow(infert))
>>>>>> clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>>>> Error in Surv(rep(1, 248L), case) : Invalid status value
>>>>>
>>>>
>>>> David, I think you were on the right track. I get this:
>>>>
>>>> -----------
>>>>> clogit(I(case*runif(length(case)))~spontaneous+induced 
>>>>> +strata(ifelse(stratum>40,NA,stratum)),data=infert)
>>>>
>>>> Error in fitter(X, Y, strats, offset, init, control, weights =  
>>>> weights,  :
>>>>  NA/NaN/Inf in foreign function call (arg 6)
>>>> In addition: Warning messages:
>>>> 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>>>  Invalid status value, converted to NA
>>>> 2: In fitter(X, Y, strats, offset, init, control, weights =  
>>>> weights,  :
>>>>  Ran out of iterations and did not converge
>>>>>
>>>> ------------
>>>>
>>>> which looks pretty much the same as Hien's error msg
>>>>
>>>> So Hien needs to create a logical status value.
>>>>
>>>> Chuck
>>>>
>>>> p.s.
>>>>
>>>>> sessionInfo()
>>>> R version 2.10.0 (2009-10-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] splines   stats     graphics  grDevices utils     datasets   
>>>> methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] survival_2.35-7
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.10.0
>>>>>
>>>>
>>>>
>>>>> So I certainly would not have proceeded to submit a full  
>>>>> analysis to clogit if I could not get a test case to run under  
>>>>> the situation you propose.
>>>>>
>>>>> -- 
>>>>> David
>>>>>
>>>>>>
>>>>>> I have checked the collinearity among predictors and they are  
>>>>>> all < 0.5 (which I think is OK). Do you know what else could  
>>>>>> make this errors?
>>>>>>
>>>>>> Thanks a lot
>>>>>>
>>>>>> Hien Nguyen
>>>>>>
>>>>>> David Winsemius wrote:
>>>>>> > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>>>>> > > > Dear R-helpers,
>>>>>> > > > > I am very new to R and trying to run the conditional  
>>>>>> logit model using
>>>>>> > > "clogit " command.
>>>>>> > > I have more than 4000 observations in my dataset and try to  
>>>>>> predict the
>>>>>> > > dependent variable from 14 independent variables. My  
>>>>>> command is as > > follows
>>>>>> > > > > clmtest1 <-
>>>>>> > > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW 
>>>>>> +NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) > > > > > >  
>>>>>> However, it produces the following errors:
>>>>>> > > > > Error in fitter(X, Y, strats, offset, init, control,  
>>>>>> weights = weights, > > :
>>>>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>>>>> > > In addition: Warning messages:
>>>>>> > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value,  
>>>>>> converted to > > NA
>>>>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights =  
>>>>>> weights, :
>>>>>> > > Ran out of iterations and did not converge
>>>>>> > > > > I search the error message from R forums but it does  
>>>>>> not say anything
>>>>>> > > for Conditional Logit Model.
>>>>>> > > With that many predictors in a small dataset, you may have  
>>>>>> created matrix > singularities. Perhaps you created a stratum  
>>>>>> where all of the subjects > experience the event and others  
>>>>>> where none did so. The coefficients might > be driven to  
>>>>>> infinities. Try simplifying the model.
>>>>>> > > > > > > Please check for me what it says and what should I  
>>>>>> do to solve it.
>>>>>> > >
>>>>>
>>>>> David Winsemius, MD
>>>>> Heritage Laboratories
>>>>> West Hartford, CT
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> Charles C. Berry                            (858) 534-2098
>>>>                                            Dept of Family/ 
>>>> Preventive Medicine
>>>> E mailto:cberry at tajo.ucsd.edu                UC San Diego
>>>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego  
>>>> 92093-0901
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Thomas Lumley            Assoc. Professor, Biostatistics
>> tlumley at u.washington.edu    University of Washington, Seattle
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
    
    
More information about the R-help
mailing list