|
Smoothing cubic spline (least squares) |
|
|
|
- If your data are superimposed by noise, then it makes little sense to represent
these data by an interpolating function. In this case some smoothing is necessary.
Here we use a least squares approach for this.
- If you compute an interpolating cubic spline S with the natural boundary conditions
S''(a)=S''(b)=0 you get a function which minimizes
∫ab (g(2)(x))2dx
over the set of all two times continuously differentiable functions which
satisfy these boundary conditions and the interpolation conditions.
The integral can be taken (for small slopes) as the total curvature of the function.
- In the smoothing process employed here a linear combination of this same integral with
the squared sum of the residuals yi-S(xi) is minimized, where the grid of the spline
is the given x-data grid.
That means that the computed spline S minimizes
∫ab (S"(x))2dx+w* ∑{i=1,...,n}(yi-S(xi))2
for some weight w. w is computed by the criterion of "generalized cross validation",
(Hutchinson and de Hoog, Numer. Math. 47, 99-106, 1985). This results also in an estimate of the
variance of the data yi.
- This method allows a good reconstruction of data subject to noise.
- The numerical code behind this is netlib/toms/642.
|
|
|
Input |
|
- You can choose between input of your own set of (x,y)-data or
the artificial generation of these data using a function on an interval and
an error level for random error generation. For the function you can choose between
5 predefined ones or a function which you specify yourself.
In the case of data of your own it is also required that you specify a raw estimate of the variance
of the y- data. If you specify this "too small", then the interpolating natural spline
will be returned!
In the case of own data you can require a printed formula for the spline representation.
- In the second case you experiment with data which are generated from the function of your
choice.
- You specify the number of data points, which is restricted to 4 <= n<= 200 !.
- In case of synthetic data you also specify the interval [a,b] from where the x-data
are drawn. The xi are chosen equidistant in this case, but your own data need not
be equidistant.
- Moreover you specify an error level for the error generation in this experiment. This is taken as relative
error hence must be in [0,1]. For example level=0.05 means 5 percent error relative to the
largest y value.
The error is computed from
level*y_abs_max*(2*r-1)
where r is an equidistributed pseudo random number in [0,1] and
y_abs_max=max(abs(yi)) .
|
|
|
Output |
|
- In case of data of your own you get a plot of the data and the smoothing spline, and in case
you required it also a printed formula which evaluates this spline.
- Otherwise you get a plot of the generated data and the smoothing spline
and a text which indicates the chosen function, the number of data and the error level.
you also get back the variance of the y-data as estimated by the method which you might
compare with the generated error.
|
|
|
Questions ?! |
|
- What takes place if the number of your data is quite small (e.g. 4) ?
- How works the method near points of nonsmoothness of a function?
- What takes place if the noise becomes very large?
|
|
|
|
|