[R] Automatic Knot selection in Piecewise linear splines
Anupam Tyagi
@nupty@g| @end|ng |rom gm@||@com
Sat Jul 27 15:04:38 CEST 2024
Thanks!
For some reason I am getting an error when I run your code with my
variables. It works fine with Martin's x and y variables.
So far as I know variable lengths are equal.
> o <-selgmented(lnCpc, ~lnGdpc, Kmax=20, type="bic", msg=TRUE)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
> length(lnCpc)
[1] 2726
> length(lnGdpc)
[1] 2726
On Fri, 26 Jul 2024 at 20:16, Vito Muggeo <vito.muggeo using unipa.it> wrote:
> dear all,
> I apologize for my delay in replying you. Here my contribution, maybe
> just for completeness:
>
> Similar to "earth", "segmented" also fits piecewise linear relationships
> with the number of breakpoints being selected by the AIC or BIC
> (recommended).
>
> #code (example and code from Martin Maechler previous email)
>
> library(segmented)
> o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
> plot(o, add=TRUE)
> lines(o, col=2) #the approx CI for the breakpoints
>
> confint(o) #the estimated breakpoints (with CI's)
> slope(o) #the estimated slopes (with CI's)
>
>
> However segmented appears to be less efficient than earth (although with
> reasonable running times), it does NOT work with multivariate responses
> neither products between piecewise linear terms.
>
> kind regards,
> Vito
>
>
>
> Il 16/07/2024 11:22, Martin Maechler ha scritto:
> >>>>>> Anupam Tyagi
> >>>>>> on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> >
> > > How can I do automatic knot selection while fitting piecewise
> linear
> > > splines to two variables x and y? Which package to use to do it
> simply? I
> > > also want to visualize the splines (and the scatter plot) with a
> graph.
> >
> > > Anupam
> >
> > NB: linear splines, i.e. piecewise linear continuous functions.
> > Given the knots, use approx() or approxfun() however, the
> > automatic knots selection does not happen in the base R packages.
> >
> > I'm sure there are several R packages doing this.
> > The best such package in my opinion is "earth" which does a
> > re-implementation (and extensive *generalization*) of the
> > famous MARS algorithm of Friedman.
> > ==>
> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
> >
> > Note that their strengths and power is that they do their work
> > for multivariate x (MARS := Multivariate Adaptive Regression
> > Splines), but indeed do work for the simple 1D case.
> >
> > In the following example, we always get 11 final knots,
> > but I'm sure one can tweak the many tuning paramters of earth()
> > to get more:
> >
> > ## Can we do knot-selection for simple (x,y) splines? === Yes, via
> earth() {using MARS}!
> >
> > x <- (0:800)/8
> >
> > f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> > curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
> >
> > set.seed(11)
> > y <- f(x) + 10*rnorm(x)
> >
> > m.sspl <- smooth.spline(x,y) # base line "standard smoother"
> >
> > require(earth)
> > fm1 <- earth(x, y) # default settings
> > summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice")
> below
> > ## Call: earth(x=x, y=y)
> >
> > ## y =
> > ## 175.9612
> > ## - 10.6744 * pmax(0, x - 4.625)
> > ## + 9.928496 * pmax(0, x - 10.875)
> > ## - 5.940857 * pmax(0, x - 20.25)
> > ## + 3.438948 * pmax(0, x - 27.125)
> > ## - 3.828159 * pmax(0, 44 - x)
> > ## + 4.207046 * pmax(0, x - 44)
> > ## + 2.573822 * pmax(0, x - 76.5)
> > ## - 10.99073 * pmax(0, x - 87.125)
> > ## + 10.97592 * pmax(0, x - 90.875)
> > ## + 9.331949 * pmax(0, x - 94)
> > ## - 8.48575 * pmax(0, x - 96.5)
> >
> > ## Selected 12 of 12 terms, and 1 of 1 predictors
> > ## Termination condition: Reached nk 21
> > ## Importance: x
> > ## Number of terms at each degree of interaction: 1 11 (additive model)
> > ## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894
> >
> >
> > fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> > summary(fm2)
> > all.equal(fm1, fm2)# they are identical (apart from 'call'):
> > fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> > ## still no change: fm3 "==" fm1
> > all.equal(predict(fm1, xx), predict(fm3, xx))
> >
> > ## BTW: The chosen knots and coefficients are
> > mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
> >
> > ## Plots : fine grid for visualization: instead of xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> > rnx <- extendrange(x) ## to extrapolate a bit
> > xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
> >
> > cbind(f = f(xx),
> > sspl = predict(m.sspl, xx)$y,
> > mars = predict(fm1, xx)) -> fits
> >
> > plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> > cols <- c(adjustcolor(2, 1/3),
> > adjustcolor(4, 2/3),
> > adjustcolor("orange4", 2/3))
> > lwds <- c(3, 2, 2)
> > matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> > legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
> > col=cols, lwd=lwds, bty = "n")
> > title(paste("earth() linear spline vs. smooth.spline(); n =",
> length(x)))
> > mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> =================================================
> Vito M.R. Muggeo, PhD
> Professor of Statistics
> Dip.to Sc Econom, Az e Statistiche
> Università di Palermo
> viale delle Scienze, edificio 13
> 90128 Palermo - ITALY
> tel: 091 23895240; fax: 091 485726
> http://www.unipa.it/persone/docenti/m/vito.muggeo
> Associate Editor: Statistical Modelling
> past chair, Statistical Modelling Society
> coordinator, PhD Program in Econ, Businss, Statist
> ==================================================
>
--
Anupam.
[[alternative HTML version deleted]]
More information about the R-help
mailing list