Hi,
I get this or similar (different integer than 75724 in error) exceptions when loading a sparse matrix (CSC) saved with savemat, all default options. >>> m = loadmat('my_large_mat.mat')
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio.py", line 176, in loadmat
matfile_dict = MR.get_variables(variable_names) File "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio5.py", line 274, in get_variables hdr, next_position = self.read_var_header()
File "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio5.py", line 236, in read_var_header raise TypeError('Expecting miMATRIX type here, got %d' % mdtype)
TypeError: Expecting miMATRIX type here, got 75724 here the matrix was: > matrix <400000x4176 sparse matrix of type '<class 'numpy.uint8'>'
with 934099575 stored elements in Compressed Sparse Column format> and looks fine before saving. It looks as if this only occurs when the saved matrix file size is > 4GB -- at least I haven't seen it with files in the 3GB range.
64 bit Linux. Not a crisis, as I am chunking anyway, so I can just chunk smaller, but when I get more RAM would be nice to bump it up to 8 GB files or so.
Thanks. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi,
On Wed, Aug 7, 2013 at 12:15 PM, Richard Llewellyn <[hidden email]> wrote: > Hi, > > I get this or similar (different integer than 75724 in error) exceptions > when loading a sparse matrix (CSC) saved with savemat, all default options. > >>>> m = loadmat('my_large_mat.mat') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio.py", > line 176, in loadmat > matfile_dict = MR.get_variables(variable_names) > File > "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio5.py", > line 274, in get_variables > hdr, next_position = self.read_var_header() > File > "/home/richard/venv3.3/lib/python3.3/site-packages/scipy/io/matlab/mio5.py", > line 236, in read_var_header > raise TypeError('Expecting miMATRIX type here, got %d' % mdtype) > TypeError: Expecting miMATRIX type here, got 75724 > > > here the matrix was: > >> matrix > <400000x4176 sparse matrix of type '<class 'numpy.uint8'>' > with 934099575 stored elements in Compressed Sparse Column format> > > and looks fine before saving. > > It looks as if this only occurs when the saved matrix file size is > 4GB -- > at least I haven't seen it with files in the 3GB range. > > 64 bit Linux. > > Not a crisis, as I am chunking anyway, so I can just chunk smaller, but when > I get more RAM would be nice to bump it up to 8 GB files or so. Ugh. I hesitate to ask, but do you get the same error for a very large non-sparse matrix? Thanks, Matthew _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks Matthew for the thought. This may not fully answer your question, but the same values saved as a large sparse matrix (csc) at 4.9GB fails to load with same TypeError, but as a numpy 2D array and matrix, which are less than half the file size (1.8GB) when saved with savemat, load without issue.
I also noticed that a sparse (csc) matrix that saved at 3.9GB loaded without issue, again suggesting that 4GB is a trigger. Again, this is not an immediate problem for me.
Thanks, Richard PS scipy 0.12 On Wed, Aug 7, 2013 at 4:47 PM, Matthew Brett <[hidden email]> wrote: Hi, _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi,
On Wed, Aug 7, 2013 at 9:04 PM, Richard Llewellyn <[hidden email]> wrote: > Thanks Matthew for the thought. > > This may not fully answer your question, but the same values saved as a > large sparse matrix (csc) at 4.9GB fails to load with same TypeError, but as > a numpy 2D array and matrix, which are less than half the file size (1.8GB) > when saved with savemat, load without issue. Do the dimensions of the arrays (M, N) make a difference? Or are they all the same (M, N) shape, with more or less non-zeros? Can you make a script that will replicate the problem for me? Thanks a lot, Matthew _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi Matthew, A short script below that shows that increasing the density triggers the error, on my machine, at file sizes over 4GB. Originally I had increased either M and N to trigger the error as well.
I suspect you'll run into a problem with available RAM. I run this on my 32GB machine with 64GB swap, and it swaps, so this takes several minutes to process at least. Pain, I know. Once I get more RAM it would be easier for me to test various permutations, but that will be awhile.
Maybe a generator could be used to build the matrix? Still, I think RAM will be an issue. Richard #################################### import numpy as np import scipy from scipy import sparse from scipy.io import loadmat,savemat
no_ones = 1000 # this fails, but 800 yields 3.6GB and passes filename = "test_csc" z = np.zeros(4250) # no of columns corresponds to my original problem, more or less.
z[np.arange(no_ones)] += 1 m = sparse.csc_matrix(np.array([z]*400000)) # increasing the number of rows during chunking is where I first ran into error.
savemat(filename,{'mat':m}) # fails here with TypeError m = loadmat(filename)['mat'] On Thu, Aug 8, 2013 at 2:12 AM, Matthew Brett <[hidden email]> wrote: Hi, _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi,
On Thu, Aug 8, 2013 at 4:41 PM, Richard Llewellyn <[hidden email]> wrote: > Hi Matthew, > > A short script below that shows that increasing the density triggers the > error, on my machine, at file sizes over 4GB. Originally I had increased > either M and N to trigger the error as well. > I suspect you'll run into a problem with available RAM. I run this on my > 32GB machine with 64GB swap, and it swaps, so this takes several minutes to > process at least. Pain, I know. Once I get more RAM it would be easier for > me to test various permutations, but that will be awhile. > > Maybe a generator could be used to build the matrix? Still, I think RAM > will be an issue. Aha - thanks for tracking that down a little further. The problem is that the matlab 5-7 file format (non-HDF) has a uint32 to store the number of bytes that the matrix takes up on disk. Your matrices causing the error are a little larger than 2**32, hence the error. Here's a relevant thread: http://www.mathworks.de/matlabcentral/newsreader/view_thread/307845 It's not hard to reproduce the error on non-sparse (appended script). We certainly need a better error for this - I'll try putting one in, Cheers, Matthew from io import BytesIO import numpy as np from scipy.io import loadmat,savemat fobj = BytesIO() m = np.empty(2**32, dtype=np.int8) n = np.arange(10).reshape((2, 5)) savemat(fobj, {'mat': m, 'n': n}) # fails here with TypeError m = loadmat(fobj)['mat'] _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |