[R] Automatic Knot selection in Piecewise linear splines

Anupam Tyagi @nupty@g| @end|ng |rom gm@||@com
Sat Jul 27 15:04:38 CEST 2024


Thanks!

For some reason I am getting an error when I run your code with my
variables. It works fine with Martin's x and y variables.
So far as I know variable lengths are equal.
> o <-selgmented(lnCpc, ~lnGdpc, Kmax=20, type="bic", msg=TRUE)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
        variable lengths differ (found for 'x')
> length(lnCpc)
[1] 2726
> length(lnGdpc)
[1] 2726

On Fri, 26 Jul 2024 at 20:16, Vito Muggeo <vito.muggeo using unipa.it> wrote:

> dear all,
> I apologize for my delay in replying you. Here my contribution, maybe
> just for completeness:
>
> Similar to "earth", "segmented" also fits piecewise linear relationships
> with the number of breakpoints being selected by the AIC or BIC
> (recommended).
>
> #code (example and code from Martin Maechler previous email)
>
> library(segmented)
> o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
> plot(o, add=TRUE)
> lines(o, col=2) #the approx CI for the breakpoints
>
> confint(o) #the estimated breakpoints (with CI's)
> slope(o) #the estimated slopes (with CI's)
>
>
> However segmented appears to be less efficient than earth (although with
> reasonable running times), it does NOT work with multivariate responses
> neither products between piecewise linear terms.
>
> kind regards,
> Vito
>
>
>
> Il 16/07/2024 11:22, Martin Maechler ha scritto:
> >>>>>> Anupam Tyagi
> >>>>>>      on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> >
> >      > How can I do automatic knot selection while fitting piecewise
> linear
> >      > splines to two variables x and y? Which package to use to do it
> simply? I
> >      > also want to visualize the splines (and the scatter plot) with a
> graph.
> >
> >      > Anupam
> >
> > NB: linear splines, i.e. piecewise linear continuous functions.
> > Given the knots, use  approx() or approxfun() however, the
> > automatic knots selection does not happen in the base R packages.
> >
> > I'm sure there are several R packages doing this.
> > The best such package in my opinion is "earth" which does a
> > re-implementation (and extensive  *generalization*) of the
> > famous  MARS algorithm of Friedman.
> > ==>
> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
> >
> > Note that their strengths and power is that  they do their work
> > for multivariate x (MARS := Multivariate Adaptive Regression
> > Splines), but indeed do work for the simple 1D case.
> >
> > In the following example, we always get 11 final knots,
> > but I'm sure one can tweak the many tuning paramters of earth()
> > to get more:
> >
> > ## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via
> earth() {using MARS}!
> >
> > x <- (0:800)/8
> >
> > f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> > curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
> >
> > set.seed(11)
> > y <- f(x) + 10*rnorm(x)
> >
> > m.sspl <- smooth.spline(x,y) # base line "standard smoother"
> >
> > require(earth)
> > fm1 <- earth(x, y) # default settings
> > summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice")
> below
> > ## Call: earth(x=x, y=y)
> >
> > ## y =
> > ##   175.9612
> > ##   -   10.6744 * pmax(0,      x -  4.625)
> > ##   +  9.928496 * pmax(0,      x - 10.875)
> > ##   -  5.940857 * pmax(0,      x -  20.25)
> > ##   +  3.438948 * pmax(0,      x - 27.125)
> > ##   -  3.828159 * pmax(0,     44 -      x)
> > ##   +  4.207046 * pmax(0,      x -     44)
> > ##   +  2.573822 * pmax(0,      x -   76.5)
> > ##   -  10.99073 * pmax(0,      x - 87.125)
> > ##   +  10.97592 * pmax(0,      x - 90.875)
> > ##   +  9.331949 * pmax(0,      x -     94)
> > ##   -   8.48575 * pmax(0,      x -   96.5)
> >
> > ## Selected 12 of 12 terms, and 1 of 1 predictors
> > ## Termination condition: Reached nk 21
> > ## Importance: x
> > ## Number of terms at each degree of interaction: 1 11 (additive model)
> > ## GCV 108.6592    RSS 82109.44    GRSq 0.861423    RSq 0.86894
> >
> >
> > fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> > summary(fm2)
> > all.equal(fm1, fm2)# they are identical (apart from 'call'):
> > fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> > ## still no change: fm3 "==" fm1
> > all.equal(predict(fm1, xx), predict(fm3, xx))
> >
> > ## BTW: The chosen knots and coefficients are
> > mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
> >
> > ## Plots : fine grid for visualization: instead of   xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> > rnx <- extendrange(x) ## to extrapolate a bit
> > xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
> >
> > cbind(f = f(xx),
> >        sspl = predict(m.sspl, xx)$y,
> >        mars = predict(fm1, xx)) -> fits
> >
> > plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> > cols <- c(adjustcolor(2, 1/3),
> >            adjustcolor(4, 2/3),
> >            adjustcolor("orange4", 2/3))
> > lwds <- c(3, 2, 2)
> > matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> > legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
> >         col=cols, lwd=lwds, bty = "n")
> > title(paste("earth() linear spline vs. smooth.spline();  n =",
> length(x)))
> > mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> =================================================
> Vito M.R. Muggeo, PhD
> Professor of Statistics
> Dip.to Sc Econom, Az e Statistiche
> Università di Palermo
> viale delle Scienze, edificio 13
> 90128 Palermo - ITALY
> tel: 091 23895240; fax: 091 485726
> http://www.unipa.it/persone/docenti/m/vito.muggeo
> Associate Editor: Statistical Modelling
> past chair, Statistical Modelling Society
> coordinator, PhD Program in Econ, Businss, Statist
> ==================================================
>


-- 
Anupam.

	[[alternative HTML version deleted]]



More information about the R-help mailing list