Smoothing cubic spline (least squares)

Directly to the input form

 
  • If your data are superimposed by noise, then it makes little sense to represent these data by an interpolating function. In this case some smoothing is necessary. Here we use a least squares approach for this.
  • If you compute an interpolating cubic spline S with the natural boundary conditions S''(a)=S''(b)=0 you get a function which minimizes
    ab (g(2)(x))2dx
    over the set of all two times continuously differentiable functions which satisfy these boundary conditions and the interpolation conditions. The integral can be taken (for small slopes) as the total curvature of the function.
  • In the smoothing process employed here a linear combination of this same integral with the squared sum of the residuals yi-S(xi) is minimized, where the grid of the spline is the given x-data grid. That means that the computed spline S minimizes
    ab (S"(x))2dx+w* {i=1,...,n}(yi-S(xi))2
    for some weight w. w is computed by the criterion of "generalized cross validation", (Hutchinson and de Hoog, Numer. Math. 47, 99-106, 1985). This results also in an estimate of the variance of the data yi.
  • This method allows a good reconstruction of data subject to noise.
  • The numerical code behind this is netlib/toms/642.
 

Input

 
  • You can choose between input of your own set of (x,y)-data or the artificial generation of these data using a function on an interval and an error level for random error generation. For the function you can choose between 5 predefined ones or a function which you specify yourself. In the case of data of your own it is also required that you specify a raw estimate of the variance of the y- data. If you specify this "too small", then the interpolating natural spline will be returned! In the case of own data you can require a printed formula for the spline representation.
  • In the second case you experiment with data which are generated from the function of your choice.
  • You specify the number of data points, which is restricted to 4 <= n<= 200 !.
  • In case of synthetic data you also specify the interval [a,b] from where the x-data are drawn. The xi are chosen equidistant in this case, but your own data need not be equidistant.
  • Moreover you specify an error level for the error generation in this experiment. This is taken as relative error hence must be in [0,1]. For example level=0.05 means 5 percent error relative to the largest y value. The error is computed from
    level*y_abs_max*(2*r-1)
    where r is an equidistributed pseudo random number in [0,1] and y_abs_max=max(abs(yi)) .
 

Output

 
  • In case of data of your own you get a plot of the data and the smoothing spline, and in case you required it also a printed formula which evaluates this spline.
  • Otherwise you get a plot of the generated data and the smoothing spline and a text which indicates the chosen function, the number of data and the error level. you also get back the variance of the y-data as estimated by the method which you might compare with the generated error.
 

Questions ?!

 
  • What takes place if the number of your data is quite small (e.g. 4) ?
  • How works the method near points of nonsmoothness of a function?
  • What takes place if the noise becomes very large?
 

To the input form

 
 
Back to the top!

18.02.2015