Record array help

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Record array help

Johann Rohwer
Hi,

Not sure whether to ask here or on the matplotlib list, but since it's
mainly a numpy/scipy issue I thought I'd try here first.

Is there any extended documentation/tutorial on record arrays? The NumPy
book is pretty cryptic about this and I'm still very new to the concept. I'm
using the csv2rec function from matplotlib (pylab) to generate a record
array from data that's in a CSV file. The CSV file basically has column
labels in the first row and numerical data in all subsequent rows. The
issues I'm struggling with are:

1. Is it possible to change the dtype of a field after the record array has
been created?

2. The CSV file has missing data points - how do I turn these into python
'None' elements in the record array? (If I leave that element empty in the
CSV file, then csv2rec complains about not being able to handle the import;
if I put 'None' in the CSV file (without quotes), then the whole field
including the 'None' and all the other float data is converted into a string
dtype, rendering the numerical data useless).

3. Is it possible to obtain a subset of the original data (corresponding to
two or more columns of the CSV file) as a conventional 2D numpy array, or
can I access the data only individually by column (i.e. field in the record
array)?

Any pointers would be appreciated!
Johann

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Record array help

Stéfan van der Walt
Hi Johann

2008/5/19 Johann Rohwer <[hidden email]>:
> Is there any extended documentation/tutorial on record arrays?

There is an introduction here:

http://www.scipy.org/RecordArrays

> 1. Is it possible to change the dtype of a field after the record array has
> been created?

It can be done, but often it is not very useful:

In [3]: dt = np.dtype([('x',np.uint8),('y',np.uint8)])

In [4]: np.array([(1,2),(3,4)],dtype=dt)
Out[4]:
array([(1, 2), (3, 4)],
      dtype=[('x', '|u1'), ('y', '|u1')])

In [5]: _.view(np.uint16)
Out[5]: array([ 513, 1027], dtype=uint16)

I suspect what you want to do is to change one 'column' from, say, int
to float, and reinterpret the data.  For that, you'll need to make a
copy.

> 2. The CSV file has missing data points - how do I turn these into python
> 'None' elements in the record array? (If I leave that element empty in the
> CSV file, then csv2rec complains about not being able to handle the import;
> if I put 'None' in the CSV file (without quotes), then the whole field
> including the 'None' and all the other float data is converted into a string
> dtype, rendering the numerical data useless).

Maybe `numpy.loadtxt` could be of some use.

> 3. Is it possible to obtain a subset of the original data (corresponding to
> two or more columns of the CSV file) as a conventional 2D numpy array, or
> can I access the data only individually by column (i.e. field in the record
> array)?

I hope someone comes up with an elegant solution, otherwise you can make a copy:

numpy.array([data['field1'], data['field2']]).T

Regards
Stéfan
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Record array help

Pierre GM-2
In reply to this post by Johann Rohwer

> Not sure whether to ask here or on the matplotlib list, but since it's
> mainly a numpy/scipy issue I thought I'd try here first.

Johann,
You'll find some basic information about record arrays on that link:
http://www.scipy.org/RecordArrays

> 1. Is it possible to change the dtype of a field after the record array has
> been created?

I'm afraid you can't. However, you can always create a new dtype afterwards,
and allocate it to your record array.

> 2. The CSV file has missing data points - how do I turn these into python
> 'None' elements in the record array?

You may want to try numpy.ma.mrecords, that gives the possibility to mask
specific fields in a record array (instead of masking whole records).
However, the module is still experimental, and some tweaking will be
expected.

> 3. Is it possible to obtain a subset of the original data (corresponding to
> two or more columns of the CSV file) as a conventional 2D numpy array, or
> can I access the data only individually by column (i.e. field in the record
> array)?

Yes, you can get a subset:

>>>import numpy as np
>>># Define some fields
>>>a = np.arange(10,dtype=int)
>>>b = np.arange(10,1,-1,dtype=int)
>>>c = np.random.rand(10)
>>>ndtype = [('a',int),('b',int),('c',float)]
>>># Define your record array
>>>mrec = np.array(zip(a,b,c), dtype=ndtype)
>>># Get a subset #1: by selecting fields
>>>subset_1 = np.column_stack([mrec['a'],mrec['b']])
>>># Get a subset #2: by changing the view
>>>subset_2 = mrec.view((int,3))[:,2]

Method #2 is quite useful if your fields have the same dtype: that way, you
can switch from records/fields to lines/columns seamlessly.




_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Record array help

Bruce Southey
In reply to this post by Stéfan van der Walt
Stéfan van der Walt wrote:

> Hi Johann
>
> 2008/5/19 Johann Rohwer <[hidden email]>:
>  
>> Is there any extended documentation/tutorial on record arrays?
>>    
>
> There is an introduction here:
>
> http://www.scipy.org/RecordArrays
>
>  
>> 1. Is it possible to change the dtype of a field after the record array has
>> been created?
>>    
>
> It can be done, but often it is not very useful:
>
> In [3]: dt = np.dtype([('x',np.uint8),('y',np.uint8)])
>
> In [4]: np.array([(1,2),(3,4)],dtype=dt)
> Out[4]:
> array([(1, 2), (3, 4)],
>       dtype=[('x', '|u1'), ('y', '|u1')])
>
> In [5]: _.view(np.uint16)
> Out[5]: array([ 513, 1027], dtype=uint16)
>
> I suspect what you want to do is to change one 'column' from, say, int
> to float, and reinterpret the data.  For that, you'll need to make a
> copy.
>
>  
>> 2. The CSV file has missing data points - how do I turn these into python
>> 'None' elements in the record array? (If I leave that element empty in the
>> CSV file, then csv2rec complains about not being able to handle the import;
>> if I put 'None' in the CSV file (without quotes), then the whole field
>> including the 'None' and all the other float data is converted into a string
>> dtype, rendering the numerical data useless).
>>    
>
> Maybe `numpy.loadtxt` could be of some use.
>
>  
>> 3. Is it possible to obtain a subset of the original data (corresponding to
>> two or more columns of the CSV file) as a conventional 2D numpy array, or
>> can I access the data only individually by column (i.e. field in the record
>> array)?
>>    
>
> I hope someone comes up with an elegant solution, otherwise you can make a copy:
>
> numpy.array([data['field1'], data['field2']]).T
>
> Regards
> Stéfan
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>  
Hi,
You might also want to check out Andrew Straw's DataFrame class:
 http://www.scipy.org/Cookbook/DataFrame

However, with missing values you probably should investigate using
Masked Arrays. You should be able to modify the DataFrame code to handle
this.


Regards
Bruce
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Record array help

Andrew Straw
Bruce Southey wrote:
> Hi,
> You might also want to check out Andrew Straw's DataFrame class:
>  http://www.scipy.org/Cookbook/DataFrame
I should note that the DataFrame idea came from the time before record
arrays. I now use csv2rec. Record arrays are much more flexible than the
DataFrame class.

-Andrew
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Record array help

Bruce Southey
Andrew Straw wrote:

> Bruce Southey wrote:
>  
>> Hi,
>> You might also want to check out Andrew Straw's DataFrame class:
>>  http://www.scipy.org/Cookbook/DataFrame
>>    
> I should note that the DataFrame idea came from the time before record
> arrays. I now use csv2rec. Record arrays are much more flexible than the
> DataFrame class.
>
> -Andrew
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>  
Hi,
Just for reference, you can get csv2rec as part of matplotlib 0.91.2
http://matplotlib.sourceforge.net/

Bruce
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user