Quantcast

scipy.interpolate.rbf: how is "smooth" defined?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

scipy.interpolate.rbf: how is "smooth" defined?

Mischa Schirmer
Hello,

I'm doing a 2D fit using scipy.interpolate.rbf.
I have no problem with the fit itself, it works fine.

Data points are randomly scattered in a x-y plane,
and have a z-value each. The data is fairly well
behaved, in the sense that variations across-x-y
plane are slow. The data is somewhat noisy, and
thus I want to smooth it.

rbf has a 'smooth' parameter which is working well,
but it is exteremly poorly documented. Basically,
for smooth=0 no smoothing takes place, and for
smooth>0 some smoothing takes place.

Does anybody know how large the smoothing length
is? Is it internally modified depending on the
number of nodes entering the fit?

I'm asking because the number of nodes in my data
can vary from a few 10s to more than 10000.
Clearly, in the latter case I can do with a much
smaller smoothing scale, but I'd like to determine
it in some automatic fashion.

It appears that the smoothing length is
somehow normalised. At the moment I set 'smooth'
to 10% of the average distance between nodes
(in data units), resulting in a (numerically)
much larger effective smoothing scale.

Any insight is much appreciated!

Mischa Schirmer
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

josef.pktd
On Mon, Aug 30, 2010 at 7:10 AM, Mischa Schirmer
<[hidden email]> wrote:

> Hello,
>
> I'm doing a 2D fit using scipy.interpolate.rbf.
> I have no problem with the fit itself, it works fine.
>
> Data points are randomly scattered in a x-y plane,
> and have a z-value each. The data is fairly well
> behaved, in the sense that variations across-x-y
> plane are slow. The data is somewhat noisy, and
> thus I want to smooth it.
>
> rbf has a 'smooth' parameter which is working well,
> but it is exteremly poorly documented. Basically,
> for smooth=0 no smoothing takes place, and for
> smooth>0 some smoothing takes place.

smooth<0 also works, and, I think, is the correct sign for gaussian
>
> Does anybody know how large the smoothing length
> is? Is it internally modified depending on the
> number of nodes entering the fit?

No, smoothing factor is directly used.
My intuition mainly for gaussian case: the estimate depends on the
data and a prior that biases towards zero. The matrix in the normal
equations are a weighted average (except negative sign) between data
(kernel matrix) with weight 1, and an identity matrix with weight
smooth.

It's a bit similar to Ridge Regression, where relative weights can be
selected depending on the data (but often just given as parameter that
depends on the "prior beliefs").
I think, it's the usual bandwidth selection problem and scipy offers
no support for this (until someone adds it).

Also, I don't know the rbf literature, so I have no idea if there are
any easy or standard solutions for choosing the smoothing parameter,
but there should be bandwidth selection criteria  analogous to other
smoothers.

>
> I'm asking because the number of nodes in my data
> can vary from a few 10s to more than 10000.
> Clearly, in the latter case I can do with a much
> smaller smoothing scale, but I'd like to determine
> it in some automatic fashion.

Is this true for all radial basis functions ?
If I remember correctly,  I had to smooth more with many points, maybe
it depends on the density of points and the amount of noise.

Josef

>
> It appears that the smoothing length is
> somehow normalised. At the moment I set 'smooth'
> to 10% of the average distance between nodes
> (in data units), resulting in a (numerically)
> much larger effective smoothing scale.
>
> Any insight is much appreciated!
>
> Mischa Schirmer
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

denis-bz-gg
In reply to this post by Mischa Schirmer
On Aug 30, 1:10 pm, Mischa Schirmer <[hidden email]> wrote:
> Hello,
...
> rbf has a 'smooth' parameter which is working well,
> but it is exteremly poorly documented. Basically,
> for smooth=0 no smoothing takes place, and for
> smooth>0 some smoothing takes place

nope, it's just an offset for the matrix A that RBF solves:
interpolate/rbf.py line 191 (scipy 0.8.0rc3)
    self.A = self._init_function(r) - eye(self.N)*self.smooth
    self.nodes = linalg.solve(self.A, self.di)

Josef, we went back and forth on this back in February, and I thought
agreed
-- the -= should be a +=  (but no ticket)
-- for gaussian, use smooth = -1e-6  to nudge the eigenvalues of A >
0;
others have eigenvalues < 0 and > 0, so leave smooth=0 .

Mischa, are you using Gaussian ?
You could then try varying epsilon aka r0 in exp( - (r / r0)^2 )
which has a big affect on the fit.
Its "default to approximate average distance between [all] nodes" is
not so hot,
better av distance to nearby nodes, i.e. smaller Gaussian peaks.

cheers
  -- denis
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

josef.pktd
On Mon, Aug 30, 2010 at 11:18 AM, denis <[hidden email]> wrote:

> On Aug 30, 1:10 pm, Mischa Schirmer <[hidden email]> wrote:
>> Hello,
> ...
>> rbf has a 'smooth' parameter which is working well,
>> but it is exteremly poorly documented. Basically,
>> for smooth=0 no smoothing takes place, and for
>> smooth>0 some smoothing takes place
>
> nope, it's just an offset for the matrix A that RBF solves:
> interpolate/rbf.py line 191 (scipy 0.8.0rc3)
>    self.A = self._init_function(r) - eye(self.N)*self.smooth
>    self.nodes = linalg.solve(self.A, self.di)
>
> Josef, we went back and forth on this back in February, and I thought
> agreed
> -- the -= should be a +=  (but no ticket)

Yes, do you want to open a ticket?

> -- for gaussian, use smooth = -1e-6  to nudge the eigenvalues of A >
> 0;

1e-6 might be more like a minimum, in many cases it might need to be
much larger. But it would be useful to have better heuristics than
just trial and error.

Josef

> others have eigenvalues < 0 and > 0, so leave smooth=0 .
>
> Mischa, are you using Gaussian ?
> You could then try varying epsilon aka r0 in exp( - (r / r0)^2 )
> which has a big affect on the fit.
> Its "default to approximate average distance between [all] nodes" is
> not so hot,
> better av distance to nearby nodes, i.e. smaller Gaussian peaks.
>
> cheers
>  -- denis
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

Gael Varoquaux
On Mon, Aug 30, 2010 at 11:57:47AM -0400, [hidden email] wrote:
> 1e-6 might be more like a minimum, in many cases it might need to be
> much larger. But it would be useful to have better heuristics than
> just trial and error.

I have not really been following the conversation (sorry), but are you
not talking about Tikhonov regularization of a covariance matrix? In
which case, I have had good experience using the Ledoit-Wolf approach to
setting the regularization factor.

I can provide numpy code for that, if needed.

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

josef.pktd
On Mon, Aug 30, 2010 at 12:01 PM, Gael Varoquaux
<[hidden email]> wrote:
> On Mon, Aug 30, 2010 at 11:57:47AM -0400, [hidden email] wrote:
>> 1e-6 might be more like a minimum, in many cases it might need to be
>> much larger. But it would be useful to have better heuristics than
>> just trial and error.
>
> I have not really been following the conversation (sorry), but are you
> not talking about Tikhonov regularization of a covariance matrix? In
> which case, I have had good experience using the Ledoit-Wolf approach to
> setting the regularization factor.

It's the same idea, but instead of the covariance matrix, RBF uses the
kernel matrix. It will be worth a try, but I don't know whether the
choice of the regularization factor will apply in the same way here.

>
> I can provide numpy code for that, if needed.

Please do, I can also use it for other things. (I never read the small
print in Ledoit-Wolf.)

Josef

>
> Gaël
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

Gael Varoquaux
On Mon, Aug 30, 2010 at 12:18:58PM -0400, [hidden email] wrote:
> Please do, I can also use it for other things. (I never read the small
> print in Ledoit-Wolf.)

There you go (BSD licensed). This will land in scikit learn at some point
because we are going to need it in different places, but I haven't found
the time to polish it.

Gael

def ledoit_wolf(x, return_factor=False):
    """ Estimates the shrunk Ledoit-Wolf covariance matrix.

        Parameters
        ----------
        x: 2D ndarray, shape (n, p)
            The data matrix, with p features and n samples.
        return_factor: boolean, optional
            If return_factor is True, the regularisation_factor is
            returned.

        Returns
        -------
        regularised_cov: 2D ndarray
            Regularized covariance
        regularisation_factor: float
            Regularisation factor

        Notes
        -----
        The regularised covariance is::

            (1 - regularisation_factor)*cov
                    + regularisation_factor*np.identity(n_features)
    """
    n_samples, n_features = x.shape
    if n_features == 1:
        if return_factor:
            return np.atleast_2d(x.std()), 0
        return np.atleast_2d(x.std())
    cov = np.dot(x.T, x)/n_samples
    i = np.identity(n_features)
    mu = np.trace(cov)/n_features
    delta = ((cov - mu*i)**2).sum()/n_features
    x2 = x**2
    beta_ = 1./(n_features*n_samples) * np.sum(
                            np.dot(x2.T, x2)/n_samples - cov**2
                )

    beta = min(beta_, delta)
    alpha = delta - beta
    if not return_factor:
        return beta/delta * mu * i + alpha/delta * cov
    else:
        return beta/delta * mu * i + alpha/delta * cov, beta/delta



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

josef.pktd
On Mon, Aug 30, 2010 at 12:22 PM, Gael Varoquaux
<[hidden email]> wrote:

> On Mon, Aug 30, 2010 at 12:18:58PM -0400, [hidden email] wrote:
>> Please do, I can also use it for other things. (I never read the small
>> print in Ledoit-Wolf.)
>
> There you go (BSD licensed). This will land in scikit learn at some point
> because we are going to need it in different places, but I haven't found
> the time to polish it.
>
> Gael
>
> def ledoit_wolf(x, return_factor=False):
>    """ Estimates the shrunk Ledoit-Wolf covariance matrix.
>
>        Parameters
>        ----------
>        x: 2D ndarray, shape (n, p)
>            The data matrix, with p features and n samples.
>        return_factor: boolean, optional
>            If return_factor is True, the regularisation_factor is
>            returned.
>
>        Returns
>        -------
>        regularised_cov: 2D ndarray
>            Regularized covariance
>        regularisation_factor: float
>            Regularisation factor
>
>        Notes
>        -----
>        The regularised covariance is::
>
>            (1 - regularisation_factor)*cov
>                    + regularisation_factor*np.identity(n_features)
>    """
>    n_samples, n_features = x.shape
>    if n_features == 1:
>        if return_factor:
>            return np.atleast_2d(x.std()), 0
>        return np.atleast_2d(x.std())
>    cov = np.dot(x.T, x)/n_samples
>    i = np.identity(n_features)
>    mu = np.trace(cov)/n_features
>    delta = ((cov - mu*i)**2).sum()/n_features
>    x2 = x**2
>    beta_ = 1./(n_features*n_samples) * np.sum(
>                            np.dot(x2.T, x2)/n_samples - cov**2
>                )
>
>    beta = min(beta_, delta)
>    alpha = delta - beta
>    if not return_factor:
>        return beta/delta * mu * i + alpha/delta * cov
>    else:
>        return beta/delta * mu * i + alpha/delta * cov, beta/delta
>

Thanks,
If you use this with moment/normal equations for penalized
least-squares, do you have to multiply also `dot(x,y)/n_samples` by
`alpha/delta` ?
In the normalization of RBF, smooth would be `beta/delta * mu /
(alpha/delta)`  or maybe additionally also divided by n_samples or
something like this.

Josef
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

Gael Varoquaux
On Mon, Aug 30, 2010 at 12:45:27PM -0400, [hidden email] wrote:
> If you use this with moment/normal equations for penalized
> least-squares, do you have to multiply also `dot(x,y)/n_samples` by
> `alpha/delta` ?

I do not know, and I do not want to reply with something stupid. If you
find out the answer, I would be interest.

> In the normalization of RBF, smooth would be `beta/delta * mu /
> (alpha/delta)`  or maybe additionally also divided by n_samples or
> something like this.

I believe it would be 'beta/delta * mu / (alpha/delta)', but I am not
sure whether or not the multiplication with n_samples should be applied,
as I haven't really looked at the exact formulation of the problem in the
context of RBFs (sorry, no time). All I know is that the trace of the
covariance matrix estimate should be kept constant as much as possible.

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

josef.pktd
On Mon, Aug 30, 2010 at 2:15 PM, Gael Varoquaux
<[hidden email]> wrote:
> On Mon, Aug 30, 2010 at 12:45:27PM -0400, [hidden email] wrote:
>> If you use this with moment/normal equations for penalized
>> least-squares, do you have to multiply also `dot(x,y)/n_samples` by
>> `alpha/delta` ?
>
> I do not know, and I do not want to reply with something stupid. If you
> find out the answer, I would be interest.

Maybe you find out before me, with all the other things I don't find
time for, Ridge, informative Bayesian priors and Tychonov will have to
wait.

Josef

>
>> In the normalization of RBF, smooth would be `beta/delta * mu /
>> (alpha/delta)`  or maybe additionally also divided by n_samples or
>> something like this.
>
> I believe it would be 'beta/delta * mu / (alpha/delta)', but I am not
> sure whether or not the multiplication with n_samples should be applied,
> as I haven't really looked at the exact formulation of the problem in the
> context of RBFs (sorry, no time). All I know is that the trace of the
> covariance matrix estimate should be kept constant as much as possible.
>
> Gaël
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scipy.interpolate.rbf: how is "smooth" defined?

denis-bz-gg
In reply to this post by Mischa Schirmer

On Aug 30, 1:10 pm, Mischa Schirmer <[hidden email]> wrote:

> Hello,
>
> I'm doing a 2D fit using scipy.interpolate.rbf.
> I have no problem with the fit itself, it works fine.
>
> Data points are randomly scattered in a x-y plane,
> and have a z-value each. The data is fairly well
> behaved, in the sense that variations across-x-y
> plane are slow. The data is somewhat noisy, and
> thus I want to smooth it.
>
> rbf has a 'smooth' parameter which is working well,
> but it is exteremly poorly documented. Basically,
> for smooth=0 no smoothing takes place, and for
> smooth>0 some smoothing takes place.

Mischa,
  are you seeing smoothing with smooth > 0 ? measured how ?

For varying epsilon aka r0 in exp( - (r / r0)^2 ),

    r0 = side * sqrt( 2 / N )

looks to be of the right scale:
on a checkerboard of N points in a side^2 square,
a circle of that radius will enclose its 8 neighbours.
Well, the points are scattered, but.
Can you try epsilon = this r0, *2, *3
keeping smooth=0 ?

Of course any interpolation depends heavily on what you're fitting.

(Opinion:
You really want smaller Gaussian hills / smaller r0
where points are clustered, bigger where sparse.
RBF is not at all adaptive -- all hills are of size r0.
griddata (Delaunay triangulation + Natural Neighbour) is adaptive,
Invdisttree (kd tree + inverse-distance weighting) is adaptive.
)

cheers
  -- denis


r0 = side * np.sqrt( 2 / N )
for smooth in ( 0, -1e-6 ):
    print "smooth %.2g" % smooth
    for c in range( 1, 5+1 ):
        rbf = Rbf( x,y,z, function=func, epsilon=c*r0, smooth=smooth )
        zi = rbf( xm.flatten(), ym.flatten() ) .reshape((ngrid,ngrid))
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Loading...