Dear all,
During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)). My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me? Thank you! -Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia [hidden email] +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Shawn,
I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly normalize the input before passing it to the optimizer. I usually do it by writing my function as g(x,norm) == f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve rigorous convergence by multiplying norm by the result of optimization and iterating, but convergence behavior far from a minimum may highly depend both on what you choose as your initial guess and what your initial normalization factor is. Eric On 1/26/2014 3:01 PM, Yuxiang Wang wrote: > Dear all, > > During optimization, it is often helpful to normalize the input > parameters to make them on the same order of magnitude, so the > convergence can be much better. For example, if we want to minimize > f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be > helpful to normalize x0[0] and x0[1] to about the same order of > magnitude (often O(1)). > > My question is, I have been using scipy.optimize and specifically, the > L-BFGS-B algorithm. I was wondering that, do I need to normalize that > manually by writing a function, or the algorithm already did it for > me? > > Thank you! > > -Shawn > SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi Eric,
Great! Thanks a lot for the confirmation. That is really helpful. -Shawn On Sun, Jan 26, 2014 at 7:01 PM, Eric Hermes <[hidden email]> wrote: > Shawn, > > I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly > normalize the input before passing it to the optimizer. I usually do it > by writing my function as g(x,norm) == > f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as > fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve > rigorous convergence by multiplying norm by the result of optimization > and iterating, but convergence behavior far from a minimum may highly > depend both on what you choose as your initial guess and what your > initial normalization factor is. > > Eric > > On 1/26/2014 3:01 PM, Yuxiang Wang wrote: >> Dear all, >> >> During optimization, it is often helpful to normalize the input >> parameters to make them on the same order of magnitude, so the >> convergence can be much better. For example, if we want to minimize >> f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be >> helpful to normalize x0[0] and x0[1] to about the same order of >> magnitude (often O(1)). >> >> My question is, I have been using scipy.optimize and specifically, the >> L-BFGS-B algorithm. I was wondering that, do I need to normalize that >> manually by writing a function, or the algorithm already did it for >> me? >> >> Thank you! >> >> -Shawn >> > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia [hidden email] +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Eric Hermes
Hi, Eric.
I have a problem with my Jacobian matrices being non-homogeneous due to being derived from variables of different units, such as when combining rotation and translation parameters. Besides scaling the initial value by a factor, do you have an example of how to scale down the pertaining Jacobian matrices? Thanks I advance! Carlos |
Well, I guess I am not Eric but I did started this whole conversation :)
Question: if your Jacobian is numerically computed (as done automatically by the minimize() function), shouldn't it be already normalized, if we already normalized the independent variable being passed into? Shawn On Sat, Apr 25, 2015 at 2:02 AM, cjaramillo <[hidden email]> wrote: > Hi, Eric. > > I have a problem with my Jacobian matrices being non-homogeneous due to > being derived from variables of different units, such as when combining > rotation and translation parameters. Besides scaling the initial value by a > factor, do you have an example of how to scale down the pertaining Jacobian > matrices? > > Thanks I advance! > > Carlos > > > > -- > View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimization-in-python-tp19084p20189.html > Sent from the Scipy-User mailing list archive at Nabble.com. > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia [hidden email] +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
I'm providing the Jacobian functions to be used in the optimization. It's faster this way. I found a couple of articles doing some normalization of Jacobian matrices of non-homogeneous parameters. Any examples with scipy woukd be appreciated. I'm trying to grasp the normalization methos for the Jacobian and they seem to be a simple diagonal matrix multiplication. |
In reply to this post by cjaramillo
On 25 April 2015 at 08:02, cjaramillo <[hidden email]> wrote: I have a problem with my Jacobian matrices being non-homogeneous due to Scaling factors can be error prone when you have to propagate them. Usually, the safest way is to blame it on the units. So, in your case, you define the rotation angles in radians and the typical displacement to be 1 in your units (assuming the rotations are big, so around 1 rad). This technique also works with any other mathematical problems, like differential equations or linear systems. /David. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks, David. However, you can still run into biased convergences even when all variables would be under the same units. For example, when you have a parameter that translates in the range from 0 to 10 [mm] and others that translate in the hundreds of [mm]. I'm searching on the web more about this, but I would appreciate you can refer me to a source where they employ unit-based scaling (normalization) of Jacobian matrices for their use within optimization. |
On 27 April 2015 at 14:40, cjaramillo <[hidden email]> wrote: Thanks, David. However, you can still run into biased convergences even when That is still not a problem. 10 mm and 500 mm are the same amount (more or less). The whole purpose of scaling is to minimise numerical inaccuracies when adding or subtracting numbers. But since the resolution of a double is 1e-15, you need nine orders of magnitude difference before you get accuracies comparable with perfectly scaled float (1e-6) (waiving my hands violently here). Note that, due to the nature of the optimisation problems, the numerical noise in most "simple" functions will get swamped by the procedure, unless you are interested in an extremely accurate result on a finely crafted function. You can check on your case and compare with and without the scaling, without providing the gradient for simplicity. I'm searching on the web more about this, This is usually called "natural units" or "nondimensionalisation". You will perhaps find more relevant hits in the context of differential equations. I don't know from the top of my head of any reference, but this should be covered in most numerical analysis texts. /David. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |