[R] optim() for ordered logit model with parallel regression assumption
Xu Jun
junxu.r at gmail.com
Wed Aug 1 23:34:07 CEST 2012
Thanks Michael. Now I switched my approach after doing some google.
Following are my new codes:
###################################################
library(foreign)
readin <- read.dta("ordfile.dta", convert.factors=FALSE)
myvars <- c("depvar", "x1", "x2", "x3")
mydta <- readin[myvars]
# remove all missings
mydta <- na.omit(mydta)
ologit.lf <- function(theta, y, X) {
X <- as.matrix(X)
y <- as.matrix(y)
n <- nrow(X)
k <- ncol(X)
b <- theta[1:k]
tau1 <- as.numeric(theta [k+1])
tau2 <- as.numeric(theta [k+2])
tau3 <- as.numeric(theta [k+3])
p1 <- (1/(1+exp( - tau1 + X %*% b)))
p2 <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
p3 <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
p4 <- 1 - (1/(1+exp( - tau3 + X %*% b)))
# create indicator variables
i <- rep(1, nrow(X))
i1 <- as.numeric( i == y)
i2 <- as.numeric(2*i == y)
i3 <- as.numeric(3*i == y)
i4 <- as.numeric(4*i == y)
llik <- sum(i1*log(p1) + i2*log(p2) + i3*log(p3) + i4*log(p4))
return(-llik)
}
y <- (mydta$depvar)
X <- cbind(mydta$x1, mydta$x2, mydta$x3)
if (FALSE) { # providing start values in optim
xint <- cbind(rep(1,nrow(X)),X)
bols <- (solve(t(xint) %*% xint)) %*% (t(xint) %*% y)
startval <- rbind(as.matrix(bols[2:nrow(bols)]),0,0,0)
}
runopt <- optim(rep(0, ncol(X)+3), ologit.lf, method="BFGS",
hessian=T, y=y, X=X)
#########################################################################
But then I got the following error message;
Error in optim(rep(0, ncol(X) + 3), ologit.lf, method = "BFGS", hessian = T, :
initial value in 'vmmin' is not finite
I know there is something wrong with the way that I set up my initial
values, but just couldn't figure out how. Any help would be greatly
appreciated!
Jun
On Tue, Jul 31, 2012 at 10:07 PM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
> On Tue, Jul 31, 2012 at 7:57 PM, Xu Jun <junxu.r at gmail.com> wrote:
>> Dear R listers,
>>
>> I am learning the MLE utility optim() in R to program ordered logit
>> models just as an exercise. See below I have three independent
>> variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not
>> yet a factor variable here. The ordered logit model satisfies the
>> parallel regression assumption. The following codes can run through,
>> but results were totally different from what I got using the polr
>> function from the MASS package. I think it might be due to the way how
>> the p is constructed in the ologit.lf function. I am relatively new to
>> R, and here I would guess probably something related to the class type
>> (numeric vs. matrix) or something along that line among those if
>> conditions. Thanks in advance for any suggestion.
>>
>> Jun Xu, PhD
>> Assistant Professor
>> Department of Sociology
>> Ball State University
>> Muncie, IN 47306
>>
>>
>> ####################################################################
>>
>> library(foreign)
>> readin <- read.dta("ordfile.dta", convert.factors=FALSE)
>> myvars <- c("depvar", "x1", "x2", "x3")
>> mydta <- readin[myvars]
>> # remove all missings
>> mydta <- na.omit(mydta)
>>
>> # theta is the parameter vector
>> ologit.lf <- function(theta, y, X) {
>> n <- nrow(X)
>> k <- ncol(X)
>> # b is the coefficient vector for independent variables
>> b <- theta[1:k]
>> # tau1 is cut-point 1
>> tau1 <- theta [k+1]
>> # tau2 is cut-point 2
>> tau2 <- theta [k+2]
>> # tau3 is cut-point 1
>> tau3 <- theta [k+3]
>>
>> if (y == 1){
>> p <- (1/(1+exp( - tau1 + X %*% b)))
>> }
>> if (y == 2) {
>> p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
>> }
>> if (y == 3) {
>> p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
>> }
>> if (y == 4) {
>> p <- 1 - (1/(1+exp( - tau3 + X %*% b)))
>> }
>> - sum(p)
>> }
>>
>> y <- as.numeric(mydta$depvar)
>> X <- cbind (mydta$x1, mydta$x2, mydta$x3)
>> runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS",
>> hessian=T, y=y, X=X)
>>
>>
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> warnings()
>>
>> 1: In if (y == 1) { ... :
>> the condition has length > 1 and only the first element will be used
>> 2: In if (y == 2) { ... :
>> the condition has length > 1 and only the first element will be used
>
> It looks like you've got a fundamental problem in your if/else
> statements. if and else are control structures and so they operate on
> the whole program flow -- I think you want the ifelse() function here
> instead.
>
> Take a look at this example:
>
> x <- c(1, 5, 9)
>
> if(x < 3) {y <- x^2} else {y <- 2}
>
> z <- ifelse(x < 3, x^2, 2)
>
> print(x)
> print(y)
> print(z)
>
> See also ?ifelse
>
> Best,
> Michael
>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list