[R] Automatic Knot selection in Piecewise linear splines

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Jul 16 11:22:46 CEST 2024


>>>>> Anupam Tyagi 
>>>>>     on Tue, 9 Jul 2024 16:16:43 +0530 writes:

    > How can I do automatic knot selection while fitting piecewise linear
    > splines to two variables x and y? Which package to use to do it simply? I
    > also want to visualize the splines (and the scatter plot) with a graph.

    > Anupam

NB: linear splines, i.e. piecewise linear continuous functions.
Given the knots, use  approx() or approxfun() however, the
automatic knots selection does not happen in the base R packages.

I'm sure there are several R packages doing this.
The best such package in my opinion is "earth" which does a
re-implementation (and extensive  *generalization*) of the
famous  MARS algorithm of Friedman.
==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines

Note that their strengths and power is that  they do their work
for multivariate x (MARS := Multivariate Adaptive Regression
Splines), but indeed do work for the simple 1D case.

In the following example, we always get 11 final knots,
but I'm sure one can tweak the many tuning paramters of earth()
to get more:

## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via  earth() {using MARS}!

x <- (0:800)/8

f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
curve(f(x), 0, 100, n = 1000, col=2, lwd=2)

set.seed(11)
y <- f(x) + 10*rnorm(x)

m.sspl <- smooth.spline(x,y) # base line "standard smoother"

require(earth)
fm1 <- earth(x, y) # default settings
summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
## Call: earth(x=x, y=y)

## y =
##   175.9612
##   -   10.6744 * pmax(0,      x -  4.625)
##   +  9.928496 * pmax(0,      x - 10.875)
##   -  5.940857 * pmax(0,      x -  20.25)
##   +  3.438948 * pmax(0,      x - 27.125)
##   -  3.828159 * pmax(0,     44 -      x)
##   +  4.207046 * pmax(0,      x -     44)
##   +  2.573822 * pmax(0,      x -   76.5)
##   -  10.99073 * pmax(0,      x - 87.125)
##   +  10.97592 * pmax(0,      x - 90.875)
##   +  9.331949 * pmax(0,      x -     94)
##   -   8.48575 * pmax(0,      x -   96.5)

## Selected 12 of 12 terms, and 1 of 1 predictors
## Termination condition: Reached nk 21
## Importance: x
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 108.6592    RSS 82109.44    GRSq 0.861423    RSq 0.86894


fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
summary(fm2)
all.equal(fm1, fm2)# they are identical (apart from 'call'):
fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning
## still no change: fm3 "==" fm1
all.equal(predict(fm1, xx), predict(fm3, xx))

## BTW: The chosen knots and coefficients are
mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))

## Plots : fine grid for visualization: instead of   xx <- seq(x[1], x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))

cbind(f = f(xx),
      sspl = predict(m.sspl, xx)$y,
      mars = predict(fm1, xx)) -> fits

plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
          adjustcolor(4, 2/3),
          adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
       col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))



More information about the R-help mailing list