[R] Savitzky-Golay Smoother

Paul Kienzle pkienzle at gmail.com
Fri Sep 26 16:07:35 CEST 2014

You can perform the equivalent filtering by hand using something like:

    k = (window_length - 1) / 2
    for i in k to len(x) - k
        xh,yh = x[points-k:points+k], y[points-k:points+k]
        idx = isfinite(xh) and isfinite(yh)
        xh,yh = xh[idx],yh[idx]
        p = polyfit(xh,yh, degree=filter_order)
        filtered_y[i] = polyval(p,x[i])

Points at the beginning use xh,yh from 1 to window_length, and at the end
from n-window_length to n.

Savitsky-Golay uses the fact that the x points are equally spaced to
precompute the weights on an FIR filter, so that

    filtered_y[i] = yh dot filter_coefficients

The FIR filter coefficients for order p on window 2k+1 are determined from
the top row of the pseudo inverse of the overdetermined polynomial equation
C whose elements, Cij are (-k+i)^(p-j) for i in 0 to 2k, j in 0 to p.  For
an order 2 solver with 5 points, this would be:

4 -2 1
1 -1  1
0  0  1
1  1  1
4  2  1

Computing the pseudo-inverse, this gives filter coefficients:

[ 0.14285714, -0.07142857, -0.14285714, -0.07142857,  0.14285714]

If the central point were NA, the coefficients would have to be:

[ 0.16666667, -0.16666667, NA, -0.16666667,  0.16666667]

Sliding the window forward one, the coefficients become:

[ 0.11363636, NA, -0.18181818, -0.09090909,  0.15909091]

Clearly, there is not going to be a trivial operation on the data stream or
filter coefficients which will handle the missing data.

You have a few options.

(1) if your data set is small, just do the filter manually.
(2) if your data set is large but the missing data is sparse, replace
missing data by infinity, then run the filter as normal.  For each
nonfinite number in the result, compute the filtered value manually.
(3) if you are not too concerned about full precision (this is, after all,
a smoothing operation), you could try replacing each missing value with the
average of the surrounding values, and hopefully this will not disturb the
polynomial approximations for the surrounding points too much.

- Paul

On Fri, Sep 26, 2014 at 3:32 AM, Erick Okuto <erickokuto at gmail.com> wrote:

> Dear Paul and Henrik,
> I have a time series with some missing data points that i need smoothed
> using Savitzky-Golay filter. Related question was asked  here
> http://thr3ads.net/r-help/2012/11/2121748-Savitzky-Golay-filtering-with-missing-data
> but no straight forward answer was posted. However, Henrik (cc'd here) did
> ask related question on smoothing for reflectance here
> http://thr3ads.net/r-help/2004/02/835137-Savitzky-Golay-smoothing-for-reflectance-data
> which i have as well been unable to follow up. I will be glad if you could
> assist me with some insights on the way forward or point to a relevant
> source of help.
> I have attached a simple test data plus the R code which only work with
> non missing data. I will appreciate any possible assistance.
> Thanks,
> Erick.
> --
> *Erick Okuto, Ph.D.*
> * Candidate*
> *School of Mathematics & Actuarial Science*
> *Jaramogi Oginga Odinga University of Science & Technology (JOOUST)/ World
> Agroforestry Centre (ICRAF), **Climate Change Unit, Nairobi-Kenya. *
> Voice:  +254207224154​    Mobile: +254725005276    Skype id:  erickokuto
> Email: erickokuto at gmail.com

	[[alternative HTML version deleted]]

More information about the R-help mailing list