[R] Estimating regression with constraints in model coefficients

Tue Apr 29 22:51:43 CEST 2025

Hi Gregg,

I am just wondering if you get any time to think about this problem.

As I understand, typically for this type of problem, we have different
intercepts for different classes, while slope (beta) coefficients
remain the same across classes.

But in my case, since we do not have any intercept term, the slope
coefficients need to be estimated separately for different classes.
Therefore, since we have 3 classes in the response variable (i.e.
'apply'), there will be 3 different sets of beta-coefficients for the
independent variables.

Under this situation, I wonder how I can create the likelihood
function subject to all applicable constraints.

Any insight would be highly appreciated.

Thanks and regards,

On Fri, Apr 25, 2025 at 12:31 AM Gregg Powell <g.a.powell using protonmail.com> wrote:
>
>
> Christofer,
> That was a detailed follow-up — you clarified the requirements precisely enough providing a position to proceed from...
>
>
> To implement this, a constrained optimization procedure to estimate an ordinal logistic regression model is needed (cumulative logit), based on:
>
> 1. Estimated Cutpoints
> – No intercept in the linear predictor
> – Cutpoints (thresholds) will be estimated directly from the data
>
> 2. Coefficient Constraints
> – Box constraints on each coefficient
> – For now:
> lower = c(1, -1, 0)
> upper = c(2, 1, 1)
> – These apply to: pared, public, gpa (in that order)
>
> 3. Sum Constraint
> – The sum of coefficients must equal 1.60
>
> 4. Data to use...
> – Use the IDRE ologit.dta dataset from UCLA (for now)
>
>
> Technical Approach
>
> • Attempt to write a custom negative log-likelihood function using the cumulative logit formulation:
>
> P(Y≤k∣X)=11+exp⁡[−(θk−Xβ)]P(Y \leq k \mid X) = \frac{1}{1 + \exp[-(\theta_k - X\beta)]}
>
>
> and derive P(Y=k)P(Y = k) from adjacent differences of these.
>
> • Cutpoints θk\theta_k will be estimated as separate parameters, with constraints to ensure they’re strictly increasing for identifiability.
>
> • The optimization will use nloptr::nloptr(), which supports:
> - Lower/upper bounds (box constraints)
> - Equality constraints (for sum of β)
> - Nonlinear inequality constraints (to keep cutpoints ordered)
>
>
> I’ll have some time later - in the next day or two to attempt a script with:
> - Custom negative log-likelihood
> - Parameter vector split into β and cutpoints
> - Constraint functions: sum(β) = 1.60 and increasing cutpoints
> - Optimization via nloptr()
>
> Hopefully, you’ll be able to run it locally with only the VGAM, foreign, and nloptr packages.
>
> I’ll send the .R file along with the next email. A best attempt, anyway.
>
>
> r/
> Gregg
>
>
> “Oh, what fun it is!”
> —Alice, Alice’s Adventures in Wonderland by Lewis Carroll
>
>
>
>
>
> On Thursday, April 24th, 2025 at 1:56 AM, Christofer Bogaso <bogaso.christofer using gmail.com> wrote:
>
> >
>
> >
>
> > Hi Gregg,
> >
>
> > Many thanks for your detail feedback, those are really super helpful.
> >
>
> > I have ordered a copy of Agresti from our local library, however it
> > appears that I would need to wait for a few days.
> >
>
> > Regrading my queries, it would be super helpful if we can build a
> > optimization algo based on below criterias:
> >
>
> > 1. Whether you want the cutpoints fixed (and to what values), or if
> > you want them estimated separately (with identifiability managed some
> > other way); I would like to have cut-points directly estimated from
> > the data
> > 2. What your bounds on the coefficients are (lower/upper vectors), For
> > this question, I am having different bounds for each of the
> > coefficients. Therefore I would have a vector of lower points and
> > other vector of upper points. However to start with let consider lower
> > and upper bounds as lower = c(1, -1, 0) and upper = c(2, 1, 1) for the
> > variables pared, public, and gpa. In my model, there would not be any
> > Intercept, hence no question on its bounds
> > 3. What value the sum of coefficients should equal (e.g., sum(β) = 1,
> > or something else); Let the sum be 1.60
> > 4. And whether you're working with the IDRE example data, or something
> > else. My original data is actually residing in a computer without any
> > access to the internet (for data security.) However we can start with
> > any suitable data for this experiment, so one such data may be
> > https://stats.idre.ucla.edu/stat/data/ologit.dta. However I am still
> > exploring if there is any possibility to extract a randomized copy of
> > that actual data, if I can - I will share once available
> >
>
> > Again, many thanks for your time and insights.
> >
>
> > Thanks and regards,
> >
>
> > On Wed, Apr 23, 2025 at 9:54 PM Gregg Powell g.a.powell using protonmail.com wrote:
> >
>
> > > Hello again Christofer,
> > > Thanks for your thoughtful note — I’m glad the outline was helpful. Apologies for the long delay getting back to you. Had to do a small bit of research…
> > >
>
> > > Recommended Text on Ordinal Log-Likelihoods:
> > > You're right that most online sources jump straight to code or canned functions. For a solid theoretical treatment of ordinal models and their likelihoods, consider:
> > > "Categorical Data Analysis" by Alan Agresti
> > > – Especially Chapters 7 and 8 on ordinal logistic regression.
> > > – Covers proportional odds models, cumulative logits, adjacent-category logits, and the derivation of likelihood functions.
> > > – Provides not only equations but also intuition behind the model structure.
> > > It’s a standard reference in the field and explains how to build log-likelihoods from first principles — including how the cumulative probabilities arise and why the difference of CDFs represents a category-specific probability.
> > > Also helpful:
> > > "An Introduction to Categorical Data Analysis" by Agresti (2nd ed) – A bit more accessible, and still covers the basics of ordinal models.
> > > ________________________________________
> > >
>
> > > If You Want to Proceed Practically:
> > > In parallel with theory, if you'd like a working R example coded from scratch — with:
> > > • a custom likelihood for an ordinal (cumulative logit) model,
> > > • fixed thresholds / no intercept,
> > > • coefficient bounds,
> > > • and a sum constraint on β
> > >
>
> > > I’d be happy to attempt that using nloptr() or constrOptim(). You’d be able to walk through it and tweak it as necessary (no guarantee that I will get it right 😊)
> > >
>
> > > Just let me know:
> > > 1. Whether you want the cutpoints fixed (and to what values), or if you want them estimated separately (with identifiability managed some other way);
> > > 2. What your bounds on the coefficients are (lower/upper vectors),
> > > 3. What value the sum of coefficients should equal (e.g., sum(β) = 1, or something else);
> > > 4. And whether you're working with the IDRE example data, or something else.
> > >
>
> > > I can use the UCLA ologit.dta dataset as a basis if that's easiest to demo on, or if you have another dataset you’d prefer – again, let me know.
> > >
>
> > > All the best!
> > >
>
> > > gregg
> > >
>
> > > On Monday, April 21st, 2025 at 11:25 AM, Christofer Bogaso bogaso.christofer using gmail.com wrote:
> > >
>
> > > > Hi Gregg,
> > >
>
> > > > I am sincerely thankful for this workout.
> > >
>
> > > > Could you please suggest any text book on how to create log-likelihood
> > > > for an ordinal model like this? Most of my online search point me
> > > > directly to some R function etc, but a theoretical discussion on this
> > > > subject would be really helpful to construct the same.
> > >
>
> > > > Thanks and regards,
> > >
>
> > > > On Mon, Apr 21, 2025 at 9:55 PM Gregg Powell g.a.powell using protonmail.com wrote:
> > >
>
> > > > > Christofer,
> > >
>
> > > > > Given the constraints you mentioned—bounded parameters, no intercept, and a sum constraint—you're outside the capabilities of most off-the-shelf ordinal logistic regression functions in R like vglm or polr.
> > >
>
> > > > > The most flexible recommendation at this point is to implement custom likelihood optimization using constrOptim() or nloptr::nloptr() with a manually coded log-likelihood function for the cumulative logit model.
> > >
>
> > > > > Since you need:
> > >
>
> > > > > Coefficient bounds (e.g., lb ≤ β ≤ ub),
> > >
>
> > > > > No intercept,
> > >
>
> > > > > And a constraint like sum(β) = C,
> > >
>
> > > > > …you'll need to code your own objective function. Here's something of a high-level outline of the approach:
> > >
>
> > > > > A. Model Setup
> > > > > Let your design matrix X be n x p, and let Y take ordered values 1, 2, ..., K.
> > >
>
> > > > > B. Parameters
> > > > > Assume the thresholds (θ_k) are fixed (or removed entirely), and you’re estimating only the coefficient vector β. Your constraints are:
> > >
>
> > > > > Box constraints: lb ≤ β ≤ ub
> > >
>
> > > > > Equality constraint: sum(β) = C
> > >
>
> > > > > C. Likelihood
> > > > > The probability of category k is given by:
> > >
>
> > > > > P(Y = k) = logistic(θ_k - Xβ) - logistic(θ_{k-1} - Xβ)
> > >
>
> > > > > Without intercepts, this becomes more like:
> > >
>
> > > > > P(Y ≤ k) = 1 / (1 + exp(-(c_k - Xβ)))
> > >
>
> > > > > …where c_k are fixed thresholds.
> > >
>
> > > > > Implementation using nloptr
> > > > > This example shows a rough sketch using nloptr, which allows both equality and bound constraints:
> > >
>
> > > > > > library(nloptr)
> > >
>
> > > > > > # Custom negative log-likelihood function
> > > > > > negLL <- function(beta, X, y, K, cutpoints) {
> > > > > > eta <- X %*% beta
> > > > > > loglik <- 0
> > > > > > for (k in 1:K) {
> > > > > > pk <- plogis(cutpoints[k] - eta) - plogis(cutpoints[k - 1] - eta)
> > > > > > loglik <- loglik + sum(log(pk[y == k]))
> > > > > > }
> > > > > > return(-loglik)
> > > > > > }
> > >
>
> > > > > > # Constraint: sum(beta) = C
> > > > > > sum_constraint <- function(beta, C) {
> > > > > > return(sum(beta) - C)
> > > > > > }
> > >
>
> > > > > > # Define objective and constraint wrapper
> > > > > > objective <- function(beta) negLL(beta, X, y, K, cutpoints)
> > > > > > eq_constraint <- function(beta) sum_constraint(beta, C = 2) # example >C
> > >
>
> > > > > > # Run optimization
> > > > > > res <- nloptr(
> > > > > > x0 = rep(0, ncol(X)),
> > > > > > eval_f = objective,
> > > > > > lb = lower_bounds,
> > > > > > ub = upper_bounds,
> > > > > > eval_g_eq = eq_constraint,
> > > > > > opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-8)
> > > > > > )
> > >
>
> > > > > The next step would be writing the actual log-likelihood for your specific problem or verifying constraint implementation.
> > >
>
> > > > > Manually coding the log-likelihood for an ordinal model is nontrivial... so a bit of a challenge :)
> > >
>
> > > > > All the best,
> > > > > gregg powell
> > > > > Sierra Vista, AZ