# unique, sort, sortrows

 Classic List Threaded
8 messages
Reply | Threaded
Open this post in threaded view
|

## unique, sort, sortrows

 Hi, I am fairly new to numpy/scipy (switching from matlab).  I have noticed a few things that left me confused.  I am wondering if I could get some input from others. 1) On the webpage: http://www.scipy.org/NumPy_for_Matlab_Users , where it says matlab sortrows(a,i) is equivalent to a[argsort(a[:,0],i)], I believe this should be a[argsort(a[:,i]),:].  Or better yet: matlab: [b,I] = sortrows(a,i) numpy: I = argsort(a[:,i]), b = a[I,:] I have already changed this on the webpage, but I want to make sure I am not missing something. 2) Is there a simple equivalent of sortrows(a) (i.e., sorting by entire rows)?  Similarly, is there a simple equivalent of the matlab Y = unique(X,'rows')?  I looked online and there appears to have been previous discussion of these, but nothing simple and general seemed to come out. 3) Is there an equivalent of [Y,I,J] = unique(X)?  In this case, I is the indices of the unique elements and J is an index from Y to X (i.e., where the unique elements appear in X.  I can get I with: I,Y = unique1d( X, return_index=True ) but J, which is very useful, is not available.  I suppose I could do: J = array([]) for i,y in enumerate(Y):         J[ nonzero( y == X ) ] = i But it seems like J is useful enough that there should be an easier way (or perhaps it should be integrated into unique1d, at the risk of adding more keyword arguments). 4) What is the best equivalent of meshgrid(x,y,z,...) or ndgrid(x,y,z...)?  I defined my own function, but is there a built in way: def ndgrid( *args ):         s = []         for a in args:                 s.append(len(a))         I = indices(s)         res = []         for a,i in zip(args,I):                 res.append(a[i])         return res This method will also be painful for large matrices as one has to create twice as much data (the index matrix and the resulting matrices). 5) I was trying to do matrix multiplication of matrices with more than 2 axes.  Normally matrix multiplication of an [M,N,P] matrix with a [P,Q,R] matrix should produce something with dimensions [M,N,Q,R], but this produces an error.  What is the appropriate idiom for this? Thanks for the help. Cheers, David -- ********************************** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html********************************** _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## Re: unique, sort, sortrows

 Hi David, It's been a long time since I used Matlab, so I can't really answer all your questions, however I think I can help you with one: On Sat, Jul 26, 2008 at 08:06:51PM +0200, David M. Kaplan wrote: > 4) What is the best equivalent of meshgrid(x,y,z,...) or > ndgrid(x,y,z...)?  I defined my own function, but is there a built in > way: I think you should look at the mgrid and ogrid functions. (See http://www.scipy.org/Numpy_Example_List ). For many operations, prefer ogrid, it saves memory and tends to works as easily as mgrid thanks to broadcasting. HTH, Gaël _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## Re: unique, sort, sortrows

 In reply to this post by David Kaplan-6 Hi David, I can comment on unique1d, as I am the culprit. I am cc'ing to   numpy-discussion as this is a numpy function. Quoting "David M. Kaplan" <[hidden email]>: > 2) Is there a simple equivalent of sortrows(a) (i.e., sorting by entire > rows)?  Similarly, is there a simple equivalent of the matlab Y = have you looked at lexsort? > 3) Is there an equivalent of [Y,I,J] = unique(X)?  In this case, I is > the indices of the unique elements and J is an index from Y to X (i.e., > where the unique elements appear in X.  I can get I with: > > I,Y = unique1d( X, return_index=True ) > > but J, which is very useful, is not available.  I suppose I could do: > > J = array([]) > for i,y in enumerate(Y): > J[ nonzero( y == X ) ] = i > > But it seems like J is useful enough that there should be an easier way > (or perhaps it should be integrated into unique1d, at the risk of adding > more keyword arguments). So basically Y = X[I] and X = Y[J], right? I do not recall matlab that   well to know for sure. It certainly could be done, I could look at it   after I return from Euroscipy (i.e. after Monday). I would replace return_index argument by two arguments: return_direct   (->I) and return_inverse (->J), ok? Does anyone propose better names? Actually most of the methods in arraysetops could return optionally   some index arrays. Would anyone use it? (I do not need it personally :) cheers, r. _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## unique, sort, sortrows

 In reply to this post by David Kaplan-6 Hi, Thanks for the very helpful comments. Regarding Gael's comment, the problem with mgrid and ogrid (at least in the version of numpy I use: 1.1.0-3) is that it currently only accepts standard indexing so you can't use it with non-uniform values.  I use this a lot when I have a model I want to run with a series of parameter values, not all of which are uniformly spaced.  For example, the following fails: mgrid[:5,[1,7,8]] I would like this to give something equivalent to meshgrid(arange(5),[1,7,8]) (up to a transpose operation).  Similarly for more input arguments.  I haven't looked at the numpy source code, but it would seem that it shouldn't be too hard to add this functionality as it already exists for r_ and c_: r_[:5,[1,7,8]] # no problem here In response to Robert's comments, I looked a bit at lexsort and didn't immediately see how it could fix my problem of sorting rows of a matrix because I didn't really understand it.  Finally I figured out that the following appears to do the trick: I = lexsort(a[:,-1::-1].T) b = a[I,:] As this is a bit tricky for someone to figure out, perhaps a helper function called sortrows would be useful in numpy?  Also, an equivalent of unique(a,'rows') would still be very useful. As for [Y,I,J] = unique(X), yes Y=X[I] and X=Y[J].  Your fix would help me out, though it would be nice to specify the call signature in the help of the new version of unique1d (it took me a while to figure out that it was I,Y and not Y,I).  I think it would also be useful to propogate these changes to the other arraysetops functions.  In particular, I use indexes returned by matlab's intersect command often: [C,IA,IB] = intersect(A,B) A use case for this is suppose you have a sparse dataset at x,y points that is "on a grid", but lacking some of the points (e.g., instrument only returns points that had valid data).  A simple way to solve this would be (using a few suggested changes to numpy/scipy): [X,Y] = mgrid[ unique(pts[:,0]), unique(pts[:,1]) ] s = X.shape newData = tile( NaN, s ).flatten() p,IA,IB = intersect( c_[X.flatten(),Y.flatten()], pts, rows=True ) newData[IA] = Data[IB] newData.reshape(s) There may be other concise ways to solve this, but this one seems fairly efficient. Thanks again. Cheers, David -- ********************************** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html********************************** _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## Re: unique, sort, sortrows

 On Sun, Jul 27, 2008 at 02:43:09PM +0200, David M. Kaplan wrote: > I haven't looked at the numpy source code, > but it would seem that it shouldn't be too hard to add this > functionality as it already exists for r_ and c_: > r_[:5,[1,7,8]] # no problem here Well, if you have time to have a look at that, there certainly is value in adding this functionnality. Cheers, Gaël _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## unique, sort, sortrows

 In reply to this post by David Kaplan-6 Hi, Well, as usual there are compromises in everything and the mgrid/ogrid functionality is the way it currently is for some good reasons.  The first reason is that python appears to be fairly sloppy about how it passes indexing arguments to the __getitem__ method.  It passes a tuple containing the arguments in all cases except when it has one argument, in which case it just passes that argument.  This means that it is hard to tell a tuple argument from several non-tuple arguments.  For example, the following two produce exactly the same call to __getitem__ : mgrid[1,2,3] mgrid[(1,2,3)] (__getitem__ receives a single tuple (1,2,3)), but different from: mgrid[[1,2,3]] (__getitem__ receives a single list = [1,2,3]).  This seems like a bug to me, but is probably considered a feature by somebody.  In any case, this is workable, but a bit annoying in that tuple arguments just aren't going to work well. The second problem is that the current implementation is fairly efficient because it forces all arguments to the same type so as to avoid some unnecessary copies (I think).  Once you allow non-slice arguments, this is hard to maintain. That being said, attached is a replacement for index_tricks.py that should implement a reasonable set of features, while only very slightly altering performance.  I have only touched nd_grid.  I haven't fixed the documentation string yet, nor have I added tests to test_index_tricks.py, but will do that if the changes will be accepted into numpy.  With the new version, old stuff should work as usual, except that mgrid now returns a list of arrays instead of an array of arrays (note that this change will cause current test_index_tricks.py to fail).  With the new changes, you can now do: mgrid[-2:5:10j,[4.5,6,7.1],12,['abc','def']] The following will work as expected: mgrid[:5,(1,2,3)] But this will not: mgrid[(1,2,3)]  # same as mgrid[1,2,3], but different from mgrid[[1,2,3]] Given these limitations, this seems like a fairly useful addition.  If this looks usable, I will clean up and add tests if desired.  If not, I recommend adding a ndgrid function to numpy that does the equivalent of matlab [X,Y,Z,...] = ndgrid(x,y,z,...) and then making the current meshgrid just call that changing the order of the first two arguments. Cheers, David -- ********************************** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html********************************** _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user index_tricks.py (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

## Re: unique, sort, sortrows

 David M. Kaplan wrote: > python appears to be fairly sloppy about how it passes > indexing arguments to the __getitem__ method. I do not generally find the word 'sloppy' to be descriptive of Python. > It passes a tuple containing the arguments in all cases > except when it has one argument, in which case it just > passes that argument. Well, not quite.  The bracket syntax is for passing a key (a single object) to __getitem__. > For example, the following two produce exactly the same > call to __getitem__ : > mgrid[1,2,3] > mgrid[(1,2,3)] Well, yes.  Note::      >>> x = 1,2,3      >>> type(x)       In Python it is the commas, not the paretheses, that is determining the tuple type. So perhaps the question you raise could be rephrased to "why does an ndarray (not Python) handle treat a list 'index' differently than a tuple 'index'?" I do not know the history of that decision, but it has been used to provide some additional functionality. Cheers, Alan Isaac _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

## unique, sort, sortrows

 In reply to this post by David Kaplan-6 Hi, >From Alan Isaac: Well, yes.  Note::      >>> x = 1,2,3      >>> type(x)       In Python it is the commas, not the paretheses, that is determining the tuple type. ---- Good point.  In any case, this is a workable problem.  Adding a comma after the last argument to mgrid[] assures that it behaves "as expected" (e.g. mgrid[(1,2),]). Attached is a newer much cleaner version of my replacement for index_tricks.py.  I did some optimisation and it looks like this new version beats the old on similar operations for large meshed matrices. For single return arrays or small matrices it looses to the old version, but only slightly.  As large matrices are probably the bottleneck, this seems like a reasonable tradeoff. Cheers, David -- ********************************** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.ur097.ird.fr/team/dkaplan/index.html********************************** _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user index_tricks.py (14K) Download Attachment