[SciPy-User] Maximum file size for savemat?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Maximum file size for savemat?

Michal Romaniuk
Hi,

I'm saving a large batch of data using savemat and although I get no
errors, the files produced are not readable for either matlab or scipy.
Is there a limit on file size?

Thanks,
Michal
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Maximum file size for savemat?

Matthew Brett
Hi,

On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk
<[hidden email]> wrote:
> Hi,
>
> I'm saving a large batch of data using savemat and although I get no
> errors, the files produced are not readable for either matlab or scipy.
> Is there a limit on file size?

Ah - yes there is - the individual matrices in the mat file cannot be
larger than 4GB.  Is it possible you hit this limit?

Sorry, I only realized this when Richard Llewellyn pointed this out a
couple of weeks ago on the list:

http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-io-loadmat-throws-TypeError-with-large-files-td18558.html

The current scipy code has an error message for matrices that are too large.

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Maximum file size for savemat?

Michal Romaniuk
Hi,

With some further work, I found that the file produced by SciPy isn't
actually correct. Matlab can read it but at some point in the array the
rest of the data is just zeros. I'm surprised that SciPy doesn't throw
an error when writing data that is too big...

Are there any good alternatives to .mat files? (Preferably something
that Matlab could read too.) PyTables?

The data consists of one large array (around 9GB), one list containing a
few smaller arrays, and a few other arrays and scalars.

Thanks,
Michal

> Hi,
>
>> Hi,
>>
>> On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> I'm saving a large batch of data using savemat and although I get no
>>> errors, the files produced are not readable for either matlab or scipy.
>>> Is there a limit on file size?
>>
>> Ah - yes there is - the individual matrices in the mat file cannot be
>> larger than 4GB.  Is it possible you hit this limit?
>>
>> Sorry, I only realized this when Richard Llewellyn pointed this out a
>> couple of weeks ago on the list:
>>
>> http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-io-loadmat-throws-TypeError-with-large-files-td18558.html
>>
>> The current scipy code has an error message for matrices that are too large.
>>
>> Cheers,
>>
>> Matthew
>
> Well, I managed to work around the problem to some extent by setting
> do_compression=True. Now Matlab can read those files (so they must be
> valid to some extent) but SciPy can't (even though they were written
> with SciPy).
>
> I get this error:
>
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in
> loadmat(file_name, mdict, appendmat, **kwargs)
>     173     variable_names = kwargs.pop('variable_names', None)
>     174     MR = mat_reader_factory(file_name, appendmat, **kwargs)
> --> 175     matfile_dict = MR.get_variables(variable_names)
>     176     if mdict is not None:
>     177         mdict.update(matfile_dict)
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in
> get_variables(self, variable_names)
>     290                 continue
>     291             try:
> --> 292                 res = self.read_var_array(hdr, process)
>     293             except MatReadError, err:
>     294                 warnings.warn(
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in
> read_var_array(self, header, process)
>     253            `process`.
>     254         '''
> --> 255         return self._matrix_reader.array_from_header(header,
> process)
>     256
>     257     def get_variables(self, variable_names=None):
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in
> scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipy/io/matlab/mio5_utils.c:5401)()
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in
> scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipy/io/matlab/mio5_utils.c:4849)()
>
> PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in
> scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
> (scipy/io/matlab/mio5_utils.c:5602)()
>
> ValueError: total size of new array must be unchanged
>
>
>
> The size of the main array is about 9 GB before compression, but the
> compressed files are less than 500 MB and closer to 400 MB. There are
> some other arrays in the file too but they are much smaller.
>
> Any ideas on how I could get SciPy to read this data back? Right now I
> can only think of storing the data in single precision format...
>
> Thanks,
> Michal

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Maximum file size for savemat?

Pauli Virtanen-3
20.08.2013 20:33, Michal Romaniuk kirjoitti:
[clip]
> Are there any good alternatives to .mat files? (Preferably
> something that Matlab could read too.) PyTables?
>
> The data consists of one large array (around 9GB), one list
> containing a few smaller arrays, and a few other arrays and
> scalars.

Use HDF5 --- both Matlab and Python can work with it. On the Python
side use either h5py or PyTables, depending on which one you like more.

As you found out, the .mat file format simply does not support data
bigger than 4 GB. The development version of Scipy (to be 0.13.0)
should throw a warning AFAIK.

--
Pauli Virtanen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Maximum file size for savemat?

Matthew Brett
In reply to this post by Michal Romaniuk
Hi,

On Tue, Aug 20, 2013 at 10:33 AM, Michal Romaniuk
<[hidden email]> wrote:
> Hi,
>
> With some further work, I found that the file produced by SciPy isn't
> actually correct. Matlab can read it but at some point in the array the
> rest of the data is just zeros. I'm surprised that SciPy doesn't throw
> an error when writing data that is too big...

Yes, that was a bug.  It should be fixed in the current code and the
next release.

The problem is that there is a length-of-array entry in the matfile
that is a uint32, so there is no way of storing matrices longer than
4GB.  Because I hadn't considered the case of very large matrices,
this length value was silently overflowing, so the pointer to the next
matrix in the mat file will be garbage, and the effect is
unpredictable.

> Are there any good alternatives to .mat files? (Preferably something
> that Matlab could read too.) PyTables?

I have no experience with matlab hdf5 (7.3) format - but I guess that
is a reasonable option

> The data consists of one large array (around 9GB), one list containing a
> few smaller arrays, and a few other arrays and scalars.

I guess you could save everything but the large array in a mat file,
and save the large array as simple binary data?

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user