[SciPy-User] ODR multiresponse multidimensional

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] ODR multiresponse multidimensional

Thomas Howells
Dear Pythonistas,

I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.

From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

It's possible I've missed something, so I'm interested if anyone has any experience with this sort of problem. If not I may dig into the wrapper a bit more and see if the functionality I want could be added.

Thanks,
Thomas Howells

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Robert Kern-2
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Thomas Howells
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern


I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). 

For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.

Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. 

My input array x is [m,n] where m is the dimensionality of the input and n is the number of observations. (In my case [3,56])
My response array y is then in fact [2,3,56] as I have two responses for each x. I arranged it this way after inspecting the test_odr.py function, in which a single-dimensional array x is matched with a two-dimensional, or multi-response, return array y in test_multi.

Unfortunately attempting to generalise in this way results in an error when the odr module analyses my array shapes. I could not find any way to tell the code that my y array is multiresponse, even having inspected the source code. I hope this explanation makes things clearer!

Thanks, Tom

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Robert Kern-2
On Tue, Sep 29, 2015 at 3:16 PM, Thomas Howells <[hidden email]> wrote:

> I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion).
>
> For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.
>
> Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse.
>
> My input array x is [m,n] where m is the dimensionality of the input and n is the number of observations. (In my case [3,56])

What's the third one? You mentioned only two: the angle and the energy.

--
Robert Kern

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Matt Newville
In reply to this post by Thomas Howells


On Tue, Sep 29, 2015 at 9:16 AM, Thomas Howells <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern


I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). 

For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.

Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. 

ORDPACK is a little strange in its support for multi-dimensional data.   The simplest thing to do (and with the added benefit that it will allow you to also use other optimization methods) is to always change the problem to a single dimension.    Actually, this is not at all hard, just a slight change in perspective.

To be clear, the term multivariate means "more than one variable parameter", not the number or shape of the observations.

In fact, for (nearly?) all optimization problems, the algorithms seek  a set of values for parameters that make the model most closely match the data.  What makes ORDPACK special is its definition of "most closely match", not really that it is multi-dimensional.  That's sort of a distraction.

The fact that you have two signals (alpha, beta) at each value of (angle, energy) is completely unimportant to the fitting algorithm.  It doesn't care what the independent variables are, or even that there *are* independent variables. It has (only) parameters and the result of the objective function.

Within your objective function, you can do anything you want.  You can concatenate multiple arrays of data, and/or reduce your multi-dimensional arrays of data to one dimension with flatten() or whatever else you need to do.  Of course, if you are modelling data, your *model* might care about the independent variables, and you'll need to make sure the data and model are the same shape and align the observations, so you might have something like   [alpha_0, beta_0, alpha_1, beta_1, ....], but (of course) the algorithm doesn't care about that  order.
 
Hope that helps,

--Matt


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Thomas Howells
Thanks Matt, 

I guess I can start by reformatting this into a single dimensional problem. I admit that part of my motivation for trying to treat this in full comes from this quote in the user reference. Here q is the number of responses:
"Note that when q > 1, the responses of a multiresponse orthogonal distance regression problem cannot simply be treated as q separate observations as can be done for ordinary least squares when the q responses are uncorrelated. This is because ODRPACK would then treat the variables associated with these q observations as unrelated, and thus not constrain the errors δi in xi to be the same for each of the q occurrences of the ith observation. The user must therefore indicate to ODRPACK when the observations are multiresponse, so that ODRPACK can make the appropriate adjustments to the estimation procedure. (See §2.B.ii, subroutine argument NQ.)" (Page 7, ODR reference manual)

I can flatten the input variables though, and see if I can get it to work as a multiresponse problem from that; then it would match the form of the multiresponse test of the bindings while maintaining the association between the two response observations.. I think. I should be able to try it tonight or tomorrow.

P.S. [3,56] was a mistake, it should of course have been [2,56]. Sorry about that, but it actually doesn't matter much for the posing of the problem whether there are two or three control variables.

On Tue, Sep 29, 2015 at 3:51 PM Matt Newville <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 9:16 AM, Thomas Howells <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern


I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). 

For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.

Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. 

ORDPACK is a little strange in its support for multi-dimensional data.   The simplest thing to do (and with the added benefit that it will allow you to also use other optimization methods) is to always change the problem to a single dimension.    Actually, this is not at all hard, just a slight change in perspective.

To be clear, the term multivariate means "more than one variable parameter", not the number or shape of the observations.

In fact, for (nearly?) all optimization problems, the algorithms seek  a set of values for parameters that make the model most closely match the data.  What makes ORDPACK special is its definition of "most closely match", not really that it is multi-dimensional.  That's sort of a distraction.

The fact that you have two signals (alpha, beta) at each value of (angle, energy) is completely unimportant to the fitting algorithm.  It doesn't care what the independent variables are, or even that there *are* independent variables. It has (only) parameters and the result of the objective function.

Within your objective function, you can do anything you want.  You can concatenate multiple arrays of data, and/or reduce your multi-dimensional arrays of data to one dimension with flatten() or whatever else you need to do.  Of course, if you are modelling data, your *model* might care about the independent variables, and you'll need to make sure the data and model are the same shape and align the observations, so you might have something like   [alpha_0, beta_0, alpha_1, beta_1, ....], but (of course) the algorithm doesn't care about that  order.
 
Hope that helps,

--Matt

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

Matt Newville


On Tue, Sep 29, 2015 at 10:24 AM, Thomas Howells <[hidden email]> wrote:
Thanks Matt, 

I guess I can start by reformatting this into a single dimensional problem. I admit that part of my motivation for trying to treat this in full comes from this quote in the user reference. Here q is the number of responses:
"Note that when q > 1, the responses of a multiresponse orthogonal distance regression problem cannot simply be treated as q separate observations as can be done for ordinary least squares when the q responses are uncorrelated. This is because ODRPACK would then treat the variables associated with these q observations as unrelated, and thus not constrain the errors δi in xi to be the same for each of the q occurrences of the ith observation. The user must therefore indicate to ODRPACK when the observations are multiresponse, so that ODRPACK can make the appropriate adjustments to the estimation procedure. (See §2.B.ii, subroutine argument NQ.)" (Page 7, ODR reference manual)


Yeah, OK it's true that ODRPACK does actually use the multi-dimensionality of the data, and what I was suggesting would remove such information that ODRPACK can use.   But I also guess you might be hitting the limitations of what "orthogonal distance" means.


I can flatten the input variables though, and see if I can get it to work as a multiresponse problem from that; then it would match the form of the multiresponse test of the bindings while maintaining the association between the two response observations.. I think. I should be able to try it tonight or tomorrow.

P.S. [3,56] was a mistake, it should of course have been [2,56]. Sorry about that, but it actually doesn't matter much for the posing of the problem whether there are two or three control variables.

On Tue, Sep 29, 2015 at 3:51 PM Matt Newville <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 9:16 AM, Thomas Howells <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern


I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). 

For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.

Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. 

ORDPACK is a little strange in its support for multi-dimensional data.   The simplest thing to do (and with the added benefit that it will allow you to also use other optimization methods) is to always change the problem to a single dimension.    Actually, this is not at all hard, just a slight change in perspective.

To be clear, the term multivariate means "more than one variable parameter", not the number or shape of the observations.

In fact, for (nearly?) all optimization problems, the algorithms seek  a set of values for parameters that make the model most closely match the data.  What makes ORDPACK special is its definition of "most closely match", not really that it is multi-dimensional.  That's sort of a distraction.

The fact that you have two signals (alpha, beta) at each value of (angle, energy) is completely unimportant to the fitting algorithm.  It doesn't care what the independent variables are, or even that there *are* independent variables. It has (only) parameters and the result of the objective function.

Within your objective function, you can do anything you want.  You can concatenate multiple arrays of data, and/or reduce your multi-dimensional arrays of data to one dimension with flatten() or whatever else you need to do.  Of course, if you are modelling data, your *model* might care about the independent variables, and you'll need to make sure the data and model are the same shape and align the observations, so you might have something like   [alpha_0, beta_0, alpha_1, beta_1, ....], but (of course) the algorithm doesn't care about that  order.
 
Hope that helps,

--Matt

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user




--
--Matt Newville <newville at cars.uchicago.edu> 630-252-0431

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: ODR multiresponse multidimensional

josef.pktd
In reply to this post by Thomas Howells


On Tue, Sep 29, 2015 at 10:16 AM, Thomas Howells <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
>
> Dear Pythonistas,
>
> I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system.
>
> From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.

What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise?


--
Robert Kern


I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). 

For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide.

Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. 

My input array x is [m,n] where m is the dimensionality of the input and n is the number of observations. (In my case [3,56])
My response array y is then in fact [2,3,56] as I have two responses for each x. I arranged it this way after inspecting the test_odr.py function, in which a single-dimensional array x is matched with a two-dimensional, or multi-response, return array y in test_multi.

I would expect multivariate y to mean 2-dimensional not 3 dimensional. I don't see how covariance matrices would work with a 3-D response, it might be possible but I have never seen it.

my guess would be that it needs a reshape to [6, 56], but I don't really understand the problem nor odr

Josef

 

Unfortunately attempting to generalise in this way results in an error when the odr module analyses my array shapes. I could not find any way to tell the code that my y array is multiresponse, even having inspected the source code. I hope this explanation makes things clearer!

Thanks, Tom

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user