Dear Pythonistas,
I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system. From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage.It's possible I've missed something, so I'm interested if anyone has any experience with this sort of problem. If not I may dig into the wrapper a bit more and see if the functionality I want could be added. Thanks, Thomas Howells _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Sep 29, 2015 at 1:43 PM, Thomas Howells <[hidden email]> wrote:
> > Dear Pythonistas, > > I'm new to the mailing list, my question here is related to scipy's Orthogonal Distance Regression (ODR) wrapper module. My apologies if you've received this before, I had some trouble with the mail delivery system. > > From perusal of the documentation for the wrapper and the underlying Fortran routines, it seems the Fortran code can handle a dataset that is both multiresponse and multidimensional. I have such a dataset that I would like to try and use the algorithm for; however as far as I can tell the ODR wrapper doesn't have the machinery to support this usage. What do you mean by "both multiresponse and multidimensional"? That the model is a function `f(x; beta) -> y` such that x and y are each vectors? Yes, it certainly supports this, and I think the docstrings are pretty clear about it. What did you read that makes you think otherwise? -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Sep 29, 2015 at 2:00 PM Robert Kern <[hidden email]> wrote:
I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide. Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. My input array x is [m,n] where m is the dimensionality of the input and n is the number of observations. (In my case [3,56]) My response array y is then in fact [2,3,56] as I have two responses for each x. I arranged it this way after inspecting the test_odr.py function, in which a single-dimensional array x is matched with a two-dimensional, or multi-response, return array y in test_multi. Unfortunately attempting to generalise in this way results in an error when the odr module analyses my array shapes. I could not find any way to tell the code that my y array is multiresponse, even having inspected the source code. I hope this explanation makes things clearer! Thanks, Tom _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Sep 29, 2015 at 3:16 PM, Thomas Howells <[hidden email]> wrote:
> I'll elucidate a little more: I have data with two control variables, theta and E, for angle and energy. Each energy is measured at each angle; this makes the data multidimensional (after checking the ODR reference guide, http://docs.scipy.org/doc/external/odrpack_guide.pdf, this is also referred to as multivariate; sorry if this caused confusion). > > For each combination of theta and E, I get two linked readings (or responses), alpha and beta, that I want to fit simultaneously. This makes it multi-response, as well as multivariate, a situation described on page 6 of the ODR reference guide. > > Ideally I need to find the best fit to all angles & both responses simultaneously to reduce correlation between parameters. The odr.Model object has instructions to handle multidimensional input x, and corresponding multidimensional response y, but not what to do if you have both a multidimensional input and multiresponse. > > My input array x is [m,n] where m is the dimensionality of the input and n is the number of observations. (In my case [3,56]) What's the third one? You mentioned only two: the angle and the energy. -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Thomas Howells
On Tue, Sep 29, 2015 at 9:16 AM, Thomas Howells <[hidden email]> wrote:
ORDPACK is a little strange in its support for multi-dimensional data. The simplest thing to do (and with the added benefit that it will allow you to also use other optimization methods) is to always change the problem to a single dimension. Actually, this is not at all hard, just a slight change in perspective. To be clear, the term multivariate means "more than one variable parameter", not the number or shape of the observations. In fact, for (nearly?) all optimization problems, the algorithms seek a set of values for parameters that make the model most closely match the data. What makes ORDPACK special is its definition of "most closely match", not really that it is multi-dimensional. That's sort of a distraction. The fact that you have two signals (alpha, beta) at each value of (angle, energy) is completely unimportant to the fitting algorithm. It doesn't care what the independent variables are, or even that there *are* independent variables. It has (only) parameters and the result of the objective function. Within your objective function, you can do anything you want. You can concatenate multiple arrays of data, and/or reduce your multi-dimensional arrays of data to one dimension with flatten() or whatever else you need to do. Of course, if you are modelling data, your *model* might care about the independent variables, and you'll need to make sure the data and model are the same shape and align the observations, so you might have something like [alpha_0, beta_0, alpha_1, beta_1, ....], but (of course) the algorithm doesn't care about that order. Hope that helps, --Matt _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks Matt, I guess I can start by reformatting this into a single dimensional problem. I admit that part of my motivation for trying to treat this in full comes from this quote in the user reference. Here q is the number of responses: "Note that when q > 1, the responses of a multiresponse orthogonal distance regression
problem cannot simply be treated as q separate observations as can be done for ordinary
least squares when the q responses are uncorrelated. This is because ODRPACK would
then treat the variables associated with these q observations as unrelated, and thus not
constrain the errors δi
in xi to be the same for each of the q occurrences of the ith
observation. The user must therefore indicate to ODRPACK when the observations
are multiresponse, so that ODRPACK can make the appropriate adjustments to the
estimation procedure. (See §2.B.ii, subroutine argument NQ.)" (Page 7, ODR reference manual) I can flatten the input variables though, and see if I can get it to work as a multiresponse problem from that; then it would match the form of the multiresponse test of the bindings while maintaining the association between the two response observations.. I think. I should be able to try it tonight or tomorrow. P.S. [3,56] was a mistake, it should of course have been [2,56]. Sorry about that, but it actually doesn't matter much for the posing of the problem whether there are two or three control variables. On Tue, Sep 29, 2015 at 3:51 PM Matt Newville <[hidden email]> wrote:
_______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Sep 29, 2015 at 10:24 AM, Thomas Howells <[hidden email]> wrote:
Yeah, OK it's true that ODRPACK does actually use the multi-dimensionality of the data, and what I was suggesting would remove such information that ODRPACK can use. But I also guess you might be hitting the limitations of what "orthogonal distance" means.
-- --Matt Newville <newville at cars.uchicago.edu> 630-252-0431
_______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Thomas Howells
On Tue, Sep 29, 2015 at 10:16 AM, Thomas Howells <[hidden email]> wrote:
I would expect multivariate y to mean 2-dimensional not 3 dimensional. I don't see how covariance matrices would work with a 3-D response, it might be possible but I have never seen it. my guess would be that it needs a reshape to [6, 56], but I don't really understand the problem nor odr Josef
_______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |