Hi,
Not sure whether to ask here or on the matplotlib list, but since it's mainly a numpy/scipy issue I thought I'd try here first. Is there any extended documentation/tutorial on record arrays? The NumPy book is pretty cryptic about this and I'm still very new to the concept. I'm using the csv2rec function from matplotlib (pylab) to generate a record array from data that's in a CSV file. The CSV file basically has column labels in the first row and numerical data in all subsequent rows. The issues I'm struggling with are: 1. Is it possible to change the dtype of a field after the record array has been created? 2. The CSV file has missing data points - how do I turn these into python 'None' elements in the record array? (If I leave that element empty in the CSV file, then csv2rec complains about not being able to handle the import; if I put 'None' in the CSV file (without quotes), then the whole field including the 'None' and all the other float data is converted into a string dtype, rendering the numerical data useless). 3. Is it possible to obtain a subset of the original data (corresponding to two or more columns of the CSV file) as a conventional 2D numpy array, or can I access the data only individually by column (i.e. field in the record array)? Any pointers would be appreciated! Johann _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Hi Johann
2008/5/19 Johann Rohwer <[hidden email]>: > Is there any extended documentation/tutorial on record arrays? There is an introduction here: http://www.scipy.org/RecordArrays > 1. Is it possible to change the dtype of a field after the record array has > been created? It can be done, but often it is not very useful: In [3]: dt = np.dtype([('x',np.uint8),('y',np.uint8)]) In [4]: np.array([(1,2),(3,4)],dtype=dt) Out[4]: array([(1, 2), (3, 4)], dtype=[('x', '|u1'), ('y', '|u1')]) In [5]: _.view(np.uint16) Out[5]: array([ 513, 1027], dtype=uint16) I suspect what you want to do is to change one 'column' from, say, int to float, and reinterpret the data. For that, you'll need to make a copy. > 2. The CSV file has missing data points - how do I turn these into python > 'None' elements in the record array? (If I leave that element empty in the > CSV file, then csv2rec complains about not being able to handle the import; > if I put 'None' in the CSV file (without quotes), then the whole field > including the 'None' and all the other float data is converted into a string > dtype, rendering the numerical data useless). Maybe `numpy.loadtxt` could be of some use. > 3. Is it possible to obtain a subset of the original data (corresponding to > two or more columns of the CSV file) as a conventional 2D numpy array, or > can I access the data only individually by column (i.e. field in the record > array)? I hope someone comes up with an elegant solution, otherwise you can make a copy: numpy.array([data['field1'], data['field2']]).T Regards Stéfan _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Johann Rohwer
> Not sure whether to ask here or on the matplotlib list, but since it's > mainly a numpy/scipy issue I thought I'd try here first. Johann, You'll find some basic information about record arrays on that link: http://www.scipy.org/RecordArrays > 1. Is it possible to change the dtype of a field after the record array has > been created? I'm afraid you can't. However, you can always create a new dtype afterwards, and allocate it to your record array. > 2. The CSV file has missing data points - how do I turn these into python > 'None' elements in the record array? You may want to try numpy.ma.mrecords, that gives the possibility to mask specific fields in a record array (instead of masking whole records). However, the module is still experimental, and some tweaking will be expected. > 3. Is it possible to obtain a subset of the original data (corresponding to > two or more columns of the CSV file) as a conventional 2D numpy array, or > can I access the data only individually by column (i.e. field in the record > array)? Yes, you can get a subset: >>>import numpy as np >>># Define some fields >>>a = np.arange(10,dtype=int) >>>b = np.arange(10,1,-1,dtype=int) >>>c = np.random.rand(10) >>>ndtype = [('a',int),('b',int),('c',float)] >>># Define your record array >>>mrec = np.array(zip(a,b,c), dtype=ndtype) >>># Get a subset #1: by selecting fields >>>subset_1 = np.column_stack([mrec['a'],mrec['b']]) >>># Get a subset #2: by changing the view >>>subset_2 = mrec.view((int,3))[:,2] Method #2 is quite useful if your fields have the same dtype: that way, you can switch from records/fields to lines/columns seamlessly. _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Stéfan van der Walt
Stéfan van der Walt wrote:
> Hi Johann > > 2008/5/19 Johann Rohwer <[hidden email]>: > >> Is there any extended documentation/tutorial on record arrays? >> > > There is an introduction here: > > http://www.scipy.org/RecordArrays > > >> 1. Is it possible to change the dtype of a field after the record array has >> been created? >> > > It can be done, but often it is not very useful: > > In [3]: dt = np.dtype([('x',np.uint8),('y',np.uint8)]) > > In [4]: np.array([(1,2),(3,4)],dtype=dt) > Out[4]: > array([(1, 2), (3, 4)], > dtype=[('x', '|u1'), ('y', '|u1')]) > > In [5]: _.view(np.uint16) > Out[5]: array([ 513, 1027], dtype=uint16) > > I suspect what you want to do is to change one 'column' from, say, int > to float, and reinterpret the data. For that, you'll need to make a > copy. > > >> 2. The CSV file has missing data points - how do I turn these into python >> 'None' elements in the record array? (If I leave that element empty in the >> CSV file, then csv2rec complains about not being able to handle the import; >> if I put 'None' in the CSV file (without quotes), then the whole field >> including the 'None' and all the other float data is converted into a string >> dtype, rendering the numerical data useless). >> > > Maybe `numpy.loadtxt` could be of some use. > > >> 3. Is it possible to obtain a subset of the original data (corresponding to >> two or more columns of the CSV file) as a conventional 2D numpy array, or >> can I access the data only individually by column (i.e. field in the record >> array)? >> > > I hope someone comes up with an elegant solution, otherwise you can make a copy: > > numpy.array([data['field1'], data['field2']]).T > > Regards > Stéfan > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > > You might also want to check out Andrew Straw's DataFrame class: http://www.scipy.org/Cookbook/DataFrame However, with missing values you probably should investigate using Masked Arrays. You should be able to modify the DataFrame code to handle this. Regards Bruce _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Bruce Southey wrote:
> Hi, > You might also want to check out Andrew Straw's DataFrame class: > http://www.scipy.org/Cookbook/DataFrame I should note that the DataFrame idea came from the time before record arrays. I now use csv2rec. Record arrays are much more flexible than the DataFrame class. -Andrew _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Andrew Straw wrote:
> Bruce Southey wrote: > >> Hi, >> You might also want to check out Andrew Straw's DataFrame class: >> http://www.scipy.org/Cookbook/DataFrame >> > I should note that the DataFrame idea came from the time before record > arrays. I now use csv2rec. Record arrays are much more flexible than the > DataFrame class. > > -Andrew > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > > Just for reference, you can get csv2rec as part of matplotlib 0.91.2 http://matplotlib.sourceforge.net/ Bruce _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |