[SciPy-User] using numpy.fromfile without knowing the number of elements in the file

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] using numpy.fromfile without knowing the number of elements in the file

Gabriele Brambilla
Hi,

I'm trying to read an unformatted file written in Fortran.
Fortran has the problem that for each write() statement it inserts 4bytes of "trash" at the beginning and at the end of the file. 

I tried to use numpy.fromfile in this way:

import numpy as np
from math import *

f = open('../De0/SN01.dat', 'rb')

for iw in range(5):
a = np.fromfile(f , dtype = np.int8 , count = 4 )
x = np.fromfile(f , dtype = np.float64 , count = 21)
a = np.fromfile(f , dtype = np.int8 , count = 4 )
print(iw, x)

It works, but here I'm bounded to set the total number of iteration in the FOR cycle. How can I repeat this cycle until the eof()?

Thanks

Gabriele


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: using numpy.fromfile without knowing the number of elements in the file

Chris Barker - NOAA Federal
On Mon, Jul 13, 2015 at 2:10 PM, Gabriele Brambilla <[hidden email]> wrote:
I'm trying to read an unformatted file written in Fortran.
Fortran has the problem that for each write() statement it inserts 4bytes of "trash" at the beginning and at the end of the file. 

yes, isn't that fun!

I can't recall -- do you only get the extra 4 bytes at the beginning an end of the entire file? that shouldn't be too hard to deal with. On the other hand, if there are 4byte gaps scattered through, then you're a bit stuck.

I recall I've given up on fromfile, and used the stdlib struct module to deal with this in the past. But:

I tried to use numpy.fromfile in this way:

import numpy as np
from math import *

why do you need math when you have numpy? -- but nothing to do with the problem at hand...
 
f = open('../De0/SN01.dat', 'rb')

for iw in range(5):
a = np.fromfile(f , dtype = np.int8 , count = 4 )
x = np.fromfile(f , dtype = np.float64 , count = 21)
a = np.fromfile(f , dtype = np.int8 , count = 4 )
print(iw, x)

It works, but here I'm bounded to set the total number of iteration in the FOR cycle. How can I repeat this cycle until the eof()?

OK, so that's  extra bytes around each record -- kind of ugly, but this works, yes?

What happens if you keep running this loop antil you get an Exception at the end of the file? That should work.

But another option is to read teh whole thing in as a single byte type with numpy, then slice it up in memory. Somethign like:

f = open('../De0/SN01.dat', 'rb')

all_data = np.fromfile( f , dtype = np.uint8 )

# now you have the wholee thing, and can slice it up:

all_data.shape = (-1, 4+21*8+4)

#now each "row" is a record

#get the "real" data:

data = all_data[:, 4:-4].copy()

# convert the type:

data = data.astype(np.float64)


totally untested, of course.

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: using numpy.fromfile without knowing the number of elements in the file

Pauli Virtanen-3
In reply to this post by Gabriele Brambilla
14.07.2015, 00:10, Gabriele Brambilla kirjoitti:
> I'm trying to read an unformatted file written in Fortran.
> Fortran has the problem that for each write() statement it inserts 4bytes
> of "trash" at the beginning and at the end of the file.

You can try this:

http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.io.FortranFile.html

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: using numpy.fromfile without knowing the number of elements in the file

Sturla Molden-3
In reply to this post by Gabriele Brambilla
Gabriele Brambilla <[hidden email]> wrote:

> I'm trying to read an unformatted file written in Fortran.
> Fortran has the problem that for each write() statement it inserts 4bytes
> of "trash" at the beginning and at the end of the file.

Fortran does not specify a binary format for unformatted files. The best
approach IMHO is to read them with Fortran, and use f2py or Cython to call
this Fortran code. But be sure to use the same Fortran compiler as the one
used to compile the code that wrote the files, because otherwise this might
fail too.

An even better solution is to avoid Fortran unformatted files! You can use
APIs like POSIX from Fortran as well, or you can write a tiny piece of C to
handle the i/o.


Sturla

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user