[SciPy-User] calculating the jacobian for a least-squares problem

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] calculating the jacobian for a least-squares problem

Andrew Nelson
I would like to calculate the Jacobian for a least squares problem, followed by a Hessian estimation, then the covariance matrix from that Hessian.

With my current approach I sometimes experience issues with the covariance matrix in that it's sometimes not positive semi-definite. I am using the covariance matrix to seed a MCMC sampling process by supplying it to `np.random.multivariate_normal` to get initial positions for the MC chain. I am using the following code:

```
from scipy.optimize._numdiff import approx_derivative
jac = approx_derivative(residuals_func, x0)
hess = np.matmul(jac.T, jac)
covar = np.linalg.inv(hess)
```

Note that x0 may not be at a minimum.

- would this be the usual way of estimating the Hessian, is there anything incorrect with the approach?
- what is the recommended way (i.e. numerically stable) of inverting the Hessian in such a situation?
- does `optimize.leastsq` do anything different?
- if `x0` is not at a minimum should the covariance matrix be expected to be positive semi-definite anyway?

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

Gregor Thalhammer-2


> Am 27.03.2018 um 01:57 schrieb Andrew Nelson <[hidden email]>:
>
> I would like to calculate the Jacobian for a least squares problem, followed by a Hessian estimation, then the covariance matrix from that Hessian.
>
> With my current approach I sometimes experience issues with the covariance matrix in that it's sometimes not positive semi-definite. I am using the covariance matrix to seed a MCMC sampling process by supplying it to `np.random.multivariate_normal` to get initial positions for the MC chain. I am using the following code:
>
> ```
> from scipy.optimize._numdiff import approx_derivative
> jac = approx_derivative(residuals_func, x0)
> hess = np.matmul(jac.T, jac)
> covar = np.linalg.inv(hess)
> ```
>
> Note that x0 may not be at a minimum.
>
> - would this be the usual way of estimating the Hessian, is there anything incorrect with the approach?
your straightforward approach is ok, especially since you don’t require the highest precision. An alternative would be to use automatic differentiation to calculate the derivatives accurately, e.g. using algopy, theano or tensor flow

> - what is the recommended way (i.e. numerically stable) of inverting the Hessian in such a situation?

If your hess matrix is close to being singular, you could gain some precision by using the QR decomposition of the jacobian. In general to solve a linear system it is recommended to avoid calculating the the inverse matrix.

> - does `optimize.leastsq` do anything different?

leastsq wraps the MINPACK library, which brings it own carefully tuned numeric differentiation routines, and it uses QR decomposition.

> - if `x0` is not at a minimum should the covariance matrix be expected to be positive semi-definite anyway?
If x0 is not a minimum, then there is no guarantee. Even if x0 is a minimum this might by violated due to numerical errors.

best
Gregor

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

Evgeni Burovski
Hi,

Additionally to what Gregor said:

- finite-differences estimation of the derivatives should really be a last resort; best is paper-and-pencil, or algorithmic differentiation (algopy et al). If that is not possible, I'd try some higher-order finite differences. E.g. approx_derivatives with method '3-point' or 'cs' (if that works).

- approx_derivative is more sophisticated than fitpack actually. IIUC minpack only does the simplest two-point forward scheme, https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack/fdjac2.f

- linalg.inv(matrix) is generally better spelled as solve(matrix, identity_matrix)

- in this case, it's indeed best to use QR or SVD. curve_fit does a pseudoinverse:


(IIRC this was written by Nikolay, and he cited Ceres or some other industry-class optimization software).

Cheers,

Evgeni


On Tue, Mar 27, 2018, 12:19 PM Gregor Thalhammer <[hidden email]> wrote:


> Am 27.03.2018 um 01:57 schrieb Andrew Nelson <[hidden email]>:
>
> I would like to calculate the Jacobian for a least squares problem, followed by a Hessian estimation, then the covariance matrix from that Hessian.
>
> With my current approach I sometimes experience issues with the covariance matrix in that it's sometimes not positive semi-definite. I am using the covariance matrix to seed a MCMC sampling process by supplying it to `np.random.multivariate_normal` to get initial positions for the MC chain. I am using the following code:
>
> ```
> from scipy.optimize._numdiff import approx_derivative
> jac = approx_derivative(residuals_func, x0)
> hess = np.matmul(jac.T, jac)
> covar = np.linalg.inv(hess)
> ```
>
> Note that x0 may not be at a minimum.
>
> - would this be the usual way of estimating the Hessian, is there anything incorrect with the approach?
your straightforward approach is ok, especially since you don’t require the highest precision. An alternative would be to use automatic differentiation to calculate the derivatives accurately, e.g. using algopy, theano or tensor flow

> - what is the recommended way (i.e. numerically stable) of inverting the Hessian in such a situation?

If your hess matrix is close to being singular, you could gain some precision by using the QR decomposition of the jacobian. In general to solve a linear system it is recommended to avoid calculating the the inverse matrix.

> - does `optimize.leastsq` do anything different?

leastsq wraps the MINPACK library, which brings it own carefully tuned numeric differentiation routines, and it uses QR decomposition.

> - if `x0` is not at a minimum should the covariance matrix be expected to be positive semi-definite anyway?
If x0 is not a minimum, then there is no guarantee. Even if x0 is a minimum this might by violated due to numerical errors.

best
Gregor

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

Mark Alexander Mikofski
Hi Andrew,

The covariance can be calculated from the Jacobian as Cij = Jij * Sij * Jij^T where S is the covariance of the inputs, C is the covariance of the outputs, and J^T is the transpose of the Jacobian Jij = dxi/dyj.

Sorry if this is redundant or unhelpful.

Best,
Mark

On Tue, Mar 27, 2018, 11:38 PM Evgeni Burovski <[hidden email]> wrote:
Hi,

Additionally to what Gregor said:

- finite-differences estimation of the derivatives should really be a last resort; best is paper-and-pencil, or algorithmic differentiation (algopy et al). If that is not possible, I'd try some higher-order finite differences. E.g. approx_derivatives with method '3-point' or 'cs' (if that works).

- approx_derivative is more sophisticated than fitpack actually. IIUC minpack only does the simplest two-point forward scheme, https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack/fdjac2.f

- linalg.inv(matrix) is generally better spelled as solve(matrix, identity_matrix)

- in this case, it's indeed best to use QR or SVD. curve_fit does a pseudoinverse:


(IIRC this was written by Nikolay, and he cited Ceres or some other industry-class optimization software).

Cheers,

Evgeni


On Tue, Mar 27, 2018, 12:19 PM Gregor Thalhammer <[hidden email]> wrote:


> Am 27.03.2018 um 01:57 schrieb Andrew Nelson <[hidden email]>:
>
> I would like to calculate the Jacobian for a least squares problem, followed by a Hessian estimation, then the covariance matrix from that Hessian.
>
> With my current approach I sometimes experience issues with the covariance matrix in that it's sometimes not positive semi-definite. I am using the covariance matrix to seed a MCMC sampling process by supplying it to `np.random.multivariate_normal` to get initial positions for the MC chain. I am using the following code:
>
> ```
> from scipy.optimize._numdiff import approx_derivative
> jac = approx_derivative(residuals_func, x0)
> hess = np.matmul(jac.T, jac)
> covar = np.linalg.inv(hess)
> ```
>
> Note that x0 may not be at a minimum.
>
> - would this be the usual way of estimating the Hessian, is there anything incorrect with the approach?
your straightforward approach is ok, especially since you don’t require the highest precision. An alternative would be to use automatic differentiation to calculate the derivatives accurately, e.g. using algopy, theano or tensor flow

> - what is the recommended way (i.e. numerically stable) of inverting the Hessian in such a situation?

If your hess matrix is close to being singular, you could gain some precision by using the QR decomposition of the jacobian. In general to solve a linear system it is recommended to avoid calculating the the inverse matrix.

> - does `optimize.leastsq` do anything different?

leastsq wraps the MINPACK library, which brings it own carefully tuned numeric differentiation routines, and it uses QR decomposition.

> - if `x0` is not at a minimum should the covariance matrix be expected to be positive semi-definite anyway?
If x0 is not a minimum, then there is no guarantee. Even if x0 is a minimum this might by violated due to numerical errors.

best
Gregor

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

josef.pktd
In reply to this post by Evgeni Burovski
On Wed, Mar 28, 2018 at 2:37 AM, Evgeni Burovski
<[hidden email]> wrote:

> Hi,
>
> Additionally to what Gregor said:
>
> - finite-differences estimation of the derivatives should really be a last
> resort; best is paper-and-pencil, or algorithmic differentiation (algopy et
> al). If that is not possible, I'd try some higher-order finite differences.
> E.g. approx_derivatives with method '3-point' or 'cs' (if that works).
>
> - approx_derivative is more sophisticated than fitpack actually. IIUC
> minpack only does the simplest two-point forward scheme,
> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack/fdjac2.f
>
> - linalg.inv(matrix) is generally better spelled as solve(matrix,
> identity_matrix)
>
> - in this case, it's indeed best to use QR or SVD. curve_fit does a
> pseudoinverse:
>
> https://github.com/scipy/scipy/blob/v1.0.0/scipy/optimize/minpack.py#L502-L790
>
> (IIRC this was written by Nikolay, and he cited Ceres or some other
> industry-class optimization software).


Depending on the use of the hessian, I would regularize the hessian if
it is not positive definite or positive semi-definite.

Simplest is to clip singular values to a threshold at or above zero or to add
a Ridge factor.
(I added a Ridge factor to help home made Newton method in statsmodels
to avoid at least some invertibility or singularity problems.)

statsmodels also has some function to find the nearest positive
(semi-) definite matrix.

(We don't regularize the final hessian used for the covariance of the
parameter estimates
in nonlinear maximum likelihood models, because if that is not
positive definite,
then there are more serious problems with the model or the data.)

Josef



>
> Cheers,
>
> Evgeni
>
>
> On Tue, Mar 27, 2018, 12:19 PM Gregor Thalhammer
> <[hidden email]> wrote:
>>
>>
>>
>> > Am 27.03.2018 um 01:57 schrieb Andrew Nelson <[hidden email]>:
>> >
>> > I would like to calculate the Jacobian for a least squares problem,
>> > followed by a Hessian estimation, then the covariance matrix from that
>> > Hessian.
>> >
>> > With my current approach I sometimes experience issues with the
>> > covariance matrix in that it's sometimes not positive semi-definite. I am
>> > using the covariance matrix to seed a MCMC sampling process by supplying it
>> > to `np.random.multivariate_normal` to get initial positions for the MC
>> > chain. I am using the following code:
>> >
>> > ```
>> > from scipy.optimize._numdiff import approx_derivative
>> > jac = approx_derivative(residuals_func, x0)
>> > hess = np.matmul(jac.T, jac)
>> > covar = np.linalg.inv(hess)
>> > ```
>> >
>> > Note that x0 may not be at a minimum.
>> >
>> > - would this be the usual way of estimating the Hessian, is there
>> > anything incorrect with the approach?
>> your straightforward approach is ok, especially since you don’t require
>> the highest precision. An alternative would be to use automatic
>> differentiation to calculate the derivatives accurately, e.g. using algopy,
>> theano or tensor flow
>>
>> > - what is the recommended way (i.e. numerically stable) of inverting the
>> > Hessian in such a situation?
>>
>> If your hess matrix is close to being singular, you could gain some
>> precision by using the QR decomposition of the jacobian. In general to solve
>> a linear system it is recommended to avoid calculating the the inverse
>> matrix.
>>
>> > - does `optimize.leastsq` do anything different?
>>
>> leastsq wraps the MINPACK library, which brings it own carefully tuned
>> numeric differentiation routines, and it uses QR decomposition.
>>
>> > - if `x0` is not at a minimum should the covariance matrix be expected
>> > to be positive semi-definite anyway?
>> If x0 is not a minimum, then there is no guarantee. Even if x0 is a
>> minimum this might by violated due to numerical errors.
>>
>> best
>> Gregor
>>
>> > _______________________________________________
>> > SciPy-User mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/scipy-user
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

josef.pktd
In reply to this post by Andrew Nelson
On Mon, Mar 26, 2018 at 7:57 PM, Andrew Nelson <[hidden email]> wrote:
> I would like to calculate the Jacobian for a least squares problem, followed
> by a Hessian estimation, then the covariance matrix from that Hessian.
>
> With my current approach I sometimes experience issues with the covariance
> matrix in that it's sometimes not positive semi-definite. I am using the
> covariance matrix to seed a MCMC sampling process by supplying it to
> `np.random.multivariate_normal` to get initial positions for the MC chain.

I never looked much at the details of MCMC.
But if your data or starting point doesn't provide good information about the
Hessian, then, I think, you could shrink the hessian to or combine it with the
prior covariance matrix, e.g. use a weighted average.

Josef


 I

> am using the following code:
>
> ```
> from scipy.optimize._numdiff import approx_derivative
> jac = approx_derivative(residuals_func, x0)
> hess = np.matmul(jac.T, jac)
> covar = np.linalg.inv(hess)
> ```
>
> Note that x0 may not be at a minimum.
>
> - would this be the usual way of estimating the Hessian, is there anything
> incorrect with the approach?
> - what is the recommended way (i.e. numerically stable) of inverting the
> Hessian in such a situation?
> - does `optimize.leastsq` do anything different?
> - if `x0` is not at a minimum should the covariance matrix be expected to be
> positive semi-definite anyway?
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

Andrew Nelson
I'm using the Hessian to calculate the covariance matrix for parameter estimates in least squares, i.e. the equivalent of `pcov` in `curve_fit` (I don't want to do a fit, I just want the covariance around the current location).

On 29 March 2018 at 03:05, <[hidden email]> wrote:
On Mon, Mar 26, 2018 at 7:57 PM, Andrew Nelson <[hidden email]> wrote:
> I would like to calculate the Jacobian for a least squares problem, followed
> by a Hessian estimation, then the covariance matrix from that Hessian.
>
> With my current approach I sometimes experience issues with the covariance
> matrix in that it's sometimes not positive semi-definite. I am using the
> covariance matrix to seed a MCMC sampling process by supplying it to
> `np.random.multivariate_normal` to get initial positions for the MC chain.

I never looked much at the details of MCMC.
But if your data or starting point doesn't provide good information about the
Hessian, then, I think, you could shrink the hessian to or combine it with the
prior covariance matrix, e.g. use a weighted average.

Josef


 I
> am using the following code:
>
> ```
> from scipy.optimize._numdiff import approx_derivative
> jac = approx_derivative(residuals_func, x0)
> hess = np.matmul(jac.T, jac)
> covar = np.linalg.inv(hess)
> ```
>
> Note that x0 may not be at a minimum.
>
> - would this be the usual way of estimating the Hessian, is there anything
> incorrect with the approach?
> - what is the recommended way (i.e. numerically stable) of inverting the
> Hessian in such a situation?
> - does `optimize.leastsq` do anything different?
> - if `x0` is not at a minimum should the covariance matrix be expected to be
> positive semi-definite anyway?
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user



--
_____________________________________
Dr. Andrew Nelson


_____________________________________

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

josef.pktd
On Wed, Mar 28, 2018 at 7:33 PM, Andrew Nelson <[hidden email]> wrote:
> I'm using the Hessian to calculate the covariance matrix for parameter
> estimates in least squares, i.e. the equivalent of `pcov` in `curve_fit` (I
> don't want to do a fit, I just want the covariance around the current
> location).

But if you are just using it to initialize a Bayesian estimator, then you
don't need the covariance from the data by itself.

What I guess:
You are using the cross-product of the jacobian (outer product of gradient).

This should in general be positive definite.
If it is not positive definite, then locally at least one of the
parameters is not
identified, i.e. your X in analogy to linear regression is singular.
You can fiddle with numerical precision to maybe get it noisily PSD, and
then you have just an almost singular jacobian with a very noise inverse.

pinv(jac)

Josef

>
> On 29 March 2018 at 03:05, <[hidden email]> wrote:
>>
>> On Mon, Mar 26, 2018 at 7:57 PM, Andrew Nelson <[hidden email]> wrote:
>> > I would like to calculate the Jacobian for a least squares problem,
>> > followed
>> > by a Hessian estimation, then the covariance matrix from that Hessian.
>> >
>> > With my current approach I sometimes experience issues with the
>> > covariance
>> > matrix in that it's sometimes not positive semi-definite. I am using the
>> > covariance matrix to seed a MCMC sampling process by supplying it to
>> > `np.random.multivariate_normal` to get initial positions for the MC
>> > chain.
>>
>> I never looked much at the details of MCMC.
>> But if your data or starting point doesn't provide good information about
>> the
>> Hessian, then, I think, you could shrink the hessian to or combine it with
>> the
>> prior covariance matrix, e.g. use a weighted average.
>>
>> Josef
>>
>>
>>  I
>> > am using the following code:
>> >
>> > ```
>> > from scipy.optimize._numdiff import approx_derivative
>> > jac = approx_derivative(residuals_func, x0)
>> > hess = np.matmul(jac.T, jac)
>> > covar = np.linalg.inv(hess)
>> > ```
>> >
>> > Note that x0 may not be at a minimum.
>> >
>> > - would this be the usual way of estimating the Hessian, is there
>> > anything
>> > incorrect with the approach?
>> > - what is the recommended way (i.e. numerically stable) of inverting the
>> > Hessian in such a situation?
>> > - does `optimize.leastsq` do anything different?
>> > - if `x0` is not at a minimum should the covariance matrix be expected
>> > to be
>> > positive semi-definite anyway?
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/scipy-user
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/scipy-user
>
>
>
>
> --
> _____________________________________
> Dr. Andrew Nelson
>
>
> _____________________________________
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculating the jacobian for a least-squares problem

Andrew Nelson
You are using the cross-product of the jacobian (outer product of gradient).

This is correct.

I discovered a bug in the way I was calculating the covariance matrix. I was initially scaling all parameters to unity, and unwinding that scaling after inverting the Hessian. I was multiplying by the incorrect values when I did so. The diagonal terms in the covariance matrix were fine, but the off diagonal terms were incorrect, leading to problems when using np.random.multivariate_normal.


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user