[R] Automatic Knot selection in Piecewise linear splines
Anupam Tyagi
@nupty@g| @end|ng |rom gm@||@com
Tue Jul 16 12:43:26 CEST 2024
Thanks, Martin. This is very helpful.
On Tue, 16 Jul 2024 at 14:52, Martin Maechler <maechler using stat.math.ethz.ch>
wrote:
> >>>>> Anupam Tyagi
> >>>>> on Tue, 9 Jul 2024 16:16:43 +0530 writes:
>
> > How can I do automatic knot selection while fitting piecewise linear
> > splines to two variables x and y? Which package to use to do it
> simply? I
> > also want to visualize the splines (and the scatter plot) with a
> graph.
>
> > Anupam
>
> NB: linear splines, i.e. piecewise linear continuous functions.
> Given the knots, use approx() or approxfun() however, the
> automatic knots selection does not happen in the base R packages.
>
> I'm sure there are several R packages doing this.
> The best such package in my opinion is "earth" which does a
> re-implementation (and extensive *generalization*) of the
> famous MARS algorithm of Friedman.
> ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
>
> Note that their strengths and power is that they do their work
> for multivariate x (MARS := Multivariate Adaptive Regression
> Splines), but indeed do work for the simple 1D case.
>
> In the following example, we always get 11 final knots,
> but I'm sure one can tweak the many tuning paramters of earth()
> to get more:
>
> ## Can we do knot-selection for simple (x,y) splines? === Yes, via
> earth() {using MARS}!
>
> x <- (0:800)/8
>
> f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
>
> set.seed(11)
> y <- f(x) + 10*rnorm(x)
>
> m.sspl <- smooth.spline(x,y) # base line "standard smoother"
>
> require(earth)
> fm1 <- earth(x, y) # default settings
> summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice") below
> ## Call: earth(x=x, y=y)
>
> ## y =
> ## 175.9612
> ## - 10.6744 * pmax(0, x - 4.625)
> ## + 9.928496 * pmax(0, x - 10.875)
> ## - 5.940857 * pmax(0, x - 20.25)
> ## + 3.438948 * pmax(0, x - 27.125)
> ## - 3.828159 * pmax(0, 44 - x)
> ## + 4.207046 * pmax(0, x - 44)
> ## + 2.573822 * pmax(0, x - 76.5)
> ## - 10.99073 * pmax(0, x - 87.125)
> ## + 10.97592 * pmax(0, x - 90.875)
> ## + 9.331949 * pmax(0, x - 94)
> ## - 8.48575 * pmax(0, x - 96.5)
>
> ## Selected 12 of 12 terms, and 1 of 1 predictors
> ## Termination condition: Reached nk 21
> ## Importance: x
> ## Number of terms at each degree of interaction: 1 11 (additive model)
> ## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894
>
>
> fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> summary(fm2)
> all.equal(fm1, fm2)# they are identical (apart from 'call'):
> fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> ## still no change: fm3 "==" fm1
> all.equal(predict(fm1, xx), predict(fm3, xx))
>
> ## BTW: The chosen knots and coefficients are
> mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
>
> ## Plots : fine grid for visualization: instead of xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> rnx <- extendrange(x) ## to extrapolate a bit
> xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
>
> cbind(f = f(xx),
> sspl = predict(m.sspl, xx)$y,
> mars = predict(fm1, xx)) -> fits
>
> plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> cols <- c(adjustcolor(2, 1/3),
> adjustcolor(4, 2/3),
> adjustcolor("orange4", 2/3))
> lwds <- c(3, 2, 2)
> matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
> col=cols, lwd=lwds, bty = "n")
> title(paste("earth() linear spline vs. smooth.spline(); n =", length(x)))
> mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
>
--
Anupam.
[[alternative HTML version deleted]]
More information about the R-help
mailing list