Dear all,
I tried using the 'scipy.sparse.eigs' tool for performing principal component analysis on a matrix which is roughly 80% sparse.
First of all, is that a good way to go about it? Second, the operation failed when the function failed to converge on accurate eigenvalues. I noticed the 'tol' attribute in the function, but how does one define a reasonable tolerance and calculate it?
Thanks
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thu, 24 Feb 2011 09:58:00 +0530, Jaidev Deshpande wrote:
> I tried using the 'scipy.sparse.eigs' tool for performing principal > component analysis on a matrix which is roughly 80% sparse. > > First of all, is that a good way to go about it? If it doesn't work as a dense matrix, then you don't have much choice than to rely on an iterative method. 'eigs' uses ARPACK. > Second, the operation failed when the function failed to converge on > accurate eigenvalues. I noticed the 'tol' attribute in the function, but > how does one define a reasonable tolerance and calculate it? As the docstring states, `tol` is the desired relative tolerance for the eigenvalue. One should expect that the error is approximately abs(exact_value - estimate) < tol*abs(exact_value) You can also try increasing `maxiter` or `ncv` to make it try harder to reach convergence. What are reasonable values for these paremeters depends on the problem in question. You can consult the ARPACK user guide for more: http://www.sc.fsu.edu/~burkardt/pdf/arpack.pdf Check e.g. the section on "Stopping criterion". _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Thu, Feb 24, 2011 at 11:37:18AM +0000, Pauli Virtanen wrote:
> > First of all, is that a good way to go about it? > If it doesn't work as a dense matrix, then you don't have much choice > than to rely on an iterative method. 'eigs' uses ARPACK. For the application of truncated PCA, for which error on small eigenvalues are not important, and very large data (much much larger than the cache size) randomized projection methods can work increadibly well (partly because they render the problem local in memory, and with large data memory bandwidth is a killer). We have a fast truncated SVD in the scikit learn that is fairly standalone, and can be extracted: https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/utils/extmath.py Don't use this for other applications than PCA! G _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Jaidev Deshpande
On Thu, Feb 24, 2011 at 1:28 PM, Jaidev Deshpande
<[hidden email]> wrote: > Dear all, > I tried using the 'scipy.sparse.eigs' tool for performing principal > component analysis on a matrix which is roughly 80% sparse. > First of all, is that a good way to go about it? It generally is, but keep in mind that 80 % sparse is not that sparse. Indeed, for efficient sparse matrices representations, each item needs besides its value two integer values for indexing. If you use single precision and 64 bits indexing, you are effectively using as much memory as a dense representation. Also, dense methods are often much faster than sparse ones for a same matrix size, especially since in scipy sparse matrices are not as well optimized as numpy arrays. You may want to compare both methods if you can. cheers, David _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |