[R] Automatic Knot selection in Piecewise linear splines

Vito Muggeo v|to@muggeo @end|ng |rom un|p@@|t
Fri Jul 26 16:46:36 CEST 2024


dear all,
I apologize for my delay in replying you. Here my contribution, maybe 
just for completeness:

Similar to "earth", "segmented" also fits piecewise linear relationships 
with the number of breakpoints being selected by the AIC or BIC 
(recommended).

#code (example and code from Martin Maechler previous email)

library(segmented)
o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
plot(o, add=TRUE)
lines(o, col=2) #the approx CI for the breakpoints

confint(o) #the estimated breakpoints (with CI's)
slope(o) #the estimated slopes (with CI's)


However segmented appears to be less efficient than earth (although with 
reasonable running times), it does NOT work with multivariate responses 
neither products between piecewise linear terms.

kind regards,
Vito



Il 16/07/2024 11:22, Martin Maechler ha scritto:
>>>>>> Anupam Tyagi
>>>>>>      on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> 
>      > How can I do automatic knot selection while fitting piecewise linear
>      > splines to two variables x and y? Which package to use to do it simply? I
>      > also want to visualize the splines (and the scatter plot) with a graph.
> 
>      > Anupam
> 
> NB: linear splines, i.e. piecewise linear continuous functions.
> Given the knots, use  approx() or approxfun() however, the
> automatic knots selection does not happen in the base R packages.
> 
> I'm sure there are several R packages doing this.
> The best such package in my opinion is "earth" which does a
> re-implementation (and extensive  *generalization*) of the
> famous  MARS algorithm of Friedman.
> ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
> 
> Note that their strengths and power is that  they do their work
> for multivariate x (MARS := Multivariate Adaptive Regression
> Splines), but indeed do work for the simple 1D case.
> 
> In the following example, we always get 11 final knots,
> but I'm sure one can tweak the many tuning paramters of earth()
> to get more:
> 
> ## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via  earth() {using MARS}!
> 
> x <- (0:800)/8
> 
> f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
> 
> set.seed(11)
> y <- f(x) + 10*rnorm(x)
> 
> m.sspl <- smooth.spline(x,y) # base line "standard smoother"
> 
> require(earth)
> fm1 <- earth(x, y) # default settings
> summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
> ## Call: earth(x=x, y=y)
> 
> ## y =
> ##   175.9612
> ##   -   10.6744 * pmax(0,      x -  4.625)
> ##   +  9.928496 * pmax(0,      x - 10.875)
> ##   -  5.940857 * pmax(0,      x -  20.25)
> ##   +  3.438948 * pmax(0,      x - 27.125)
> ##   -  3.828159 * pmax(0,     44 -      x)
> ##   +  4.207046 * pmax(0,      x -     44)
> ##   +  2.573822 * pmax(0,      x -   76.5)
> ##   -  10.99073 * pmax(0,      x - 87.125)
> ##   +  10.97592 * pmax(0,      x - 90.875)
> ##   +  9.331949 * pmax(0,      x -     94)
> ##   -   8.48575 * pmax(0,      x -   96.5)
> 
> ## Selected 12 of 12 terms, and 1 of 1 predictors
> ## Termination condition: Reached nk 21
> ## Importance: x
> ## Number of terms at each degree of interaction: 1 11 (additive model)
> ## GCV 108.6592    RSS 82109.44    GRSq 0.861423    RSq 0.86894
> 
> 
> fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> summary(fm2)
> all.equal(fm1, fm2)# they are identical (apart from 'call'):
> fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning
> ## still no change: fm3 "==" fm1
> all.equal(predict(fm1, xx), predict(fm3, xx))
> 
> ## BTW: The chosen knots and coefficients are
> mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
> 
> ## Plots : fine grid for visualization: instead of   xx <- seq(x[1], x[length(x)], length.out = 1024)
> rnx <- extendrange(x) ## to extrapolate a bit
> xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
> 
> cbind(f = f(xx),
>        sspl = predict(m.sspl, xx)$y,
>        mars = predict(fm1, xx)) -> fits
> 
> plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> cols <- c(adjustcolor(2, 1/3),
>            adjustcolor(4, 2/3),
>            adjustcolor("orange4", 2/3))
> lwds <- c(3, 2, 2)
> matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
>         col=cols, lwd=lwds, bty = "n")
> title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
> mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
=================================================
Vito M.R. Muggeo, PhD
Professor of Statistics
Dip.to Sc Econom, Az e Statistiche
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240; fax: 091 485726
http://www.unipa.it/persone/docenti/m/vito.muggeo
Associate Editor: Statistical Modelling
past chair, Statistical Modelling Society
coordinator, PhD Program in Econ, Businss, Statist



More information about the R-help mailing list