[R] Clogit or LRM?

Noah Silverman noah at smartmediacorp.com
Wed Aug 26 01:25:29 CEST 2009


I believe that I'm getting very close in my modeling application.

I've come across a challenge that I am unable to solve and would really 
appreciate the group's opinion.

I've been using the val.prob function from the Design library (Thanks 
Frank!!) to both evaluate and visualize my model.

  From the scores and graph, it appears as my model is very accurate in 
predicting probabilities correctly.  Please see attachment "graph1.pdf"

Since I'm scoring horse races, I assume that I need to "normalize" the 
predicted probabilities by race.  (Described in Bentor.)
I am calculating a conditional logit manually since there is a bug in 
the Survival library for this function.

A val.prob function applied to my conditional logit score shows an 
interesting result.  The line is almost perfectly parallel to the 
"ideal" mark on the graph, but is offset by a significant amount.  My 
first thought is that this indicates an error in my calculation 
somewhere. Please see attachment "graph2.pdf"

Below is the two step process that I used for the conditional logit.
1) First a standard logistic regression is calculated on two variables: 
model <- lrm(label ~ val1 + val2,  data = traindata )

This gives me the following results:
            Coef   S.E.    Wald Z P
Intercept 1.8065 0.05137 35.16  0
val1     0.8105 0.02567 31.57  0
val2     0.5218 0.04308 12.11  0

2) I then calculate a conditional logit:

testdata$log_int <- exp( model$coefficients[2] * model$val1 + 
model$coefficients[3] * model$val2)
for(race in testdata$races){
      testlogdata$c_prob[testdata$code== race] <- 
testdata$log_int[testdata$race== race] / 
sum(testdata$log_int[testlogdata$race == race])

Do you have any idea why this might be happening?  Did I miss something 
in my calculation?

Additionally, please notice the "Logistic Calibration" line on graph1. 
It appears almost perfect.  My thought is that whatever transformation 
the val.prob is doing to my predictions is helping. How would I 
store/access those values?

Once I can finalize the prediction of probabilities, then I can focus on 
the application to a betting model.  Having a high level of confidence 
in my models predictions is obviously the first step.

I really appreciate it.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: graph2.pdf
Type: application/pdf
Size: 290782 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090825/157cd8e1/attachment-0004.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graph1.pdf
Type: application/pdf
Size: 289181 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090825/157cd8e1/attachment-0005.pdf>

More information about the R-help mailing list