unique, sort, sortrows

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

unique, sort, sortrows

David Kaplan-6
Hi,

I am fairly new to numpy/scipy (switching from matlab).  I have noticed
a few things that left me confused.  I am wondering if I could get some
input from others.

1) On the webpage: http://www.scipy.org/NumPy_for_Matlab_Users , where
it says matlab sortrows(a,i) is equivalent to a[argsort(a[:,0],i)], I
believe this should be a[argsort(a[:,i]),:].  Or better yet:

matlab: [b,I] = sortrows(a,i)
numpy: I = argsort(a[:,i]), b = a[I,:]

I have already changed this on the webpage, but I want to make sure I am
not missing something.

2) Is there a simple equivalent of sortrows(a) (i.e., sorting by entire
rows)?  Similarly, is there a simple equivalent of the matlab Y =
unique(X,'rows')?  I looked online and there appears to have been
previous discussion of these, but nothing simple and general seemed to
come out.

3) Is there an equivalent of [Y,I,J] = unique(X)?  In this case, I is
the indices of the unique elements and J is an index from Y to X (i.e.,
where the unique elements appear in X.  I can get I with:

I,Y = unique1d( X, return_index=True )

but J, which is very useful, is not available.  I suppose I could do:

J = array([])
for i,y in enumerate(Y):
        J[ nonzero( y == X ) ] = i

But it seems like J is useful enough that there should be an easier way
(or perhaps it should be integrated into unique1d, at the risk of adding
more keyword arguments).

4) What is the best equivalent of meshgrid(x,y,z,...) or
ndgrid(x,y,z...)?  I defined my own function, but is there a built in
way:

def ndgrid( *args ):
        s = []
        for a in args:
                s.append(len(a))
        I = indices(s)
        res = []
        for a,i in zip(args,I):
                res.append(a[i])
        return res

This method will also be painful for large matrices as one has to create
twice as much data (the index matrix and the resulting matrices).

5) I was trying to do matrix multiplication of matrices with more than 2
axes.  Normally matrix multiplication of an [M,N,P] matrix with a
[P,Q,R] matrix should produce something with dimensions [M,N,Q,R], but
this produces an error.  What is the appropriate idiom for this?

Thanks for the help.

Cheers,
David

--
**********************************
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**********************************


_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: unique, sort, sortrows

Gael Varoquaux
Hi David,

It's been a long time since I used Matlab, so I can't really answer all
your questions, however I think I can help you with one:

On Sat, Jul 26, 2008 at 08:06:51PM +0200, David M. Kaplan wrote:
> 4) What is the best equivalent of meshgrid(x,y,z,...) or
> ndgrid(x,y,z...)?  I defined my own function, but is there a built in
> way:

I think you should look at the mgrid and ogrid functions. (See
http://www.scipy.org/Numpy_Example_List ). For many operations, prefer
ogrid, it saves memory and tends to works as easily as mgrid thanks to
broadcasting.

HTH,

Gaël
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: unique, sort, sortrows

Robert Cimrman
In reply to this post by David Kaplan-6
Hi David,

I can comment on unique1d, as I am the culprit. I am cc'ing to  
numpy-discussion as this is a numpy function.

Quoting "David M. Kaplan" <[hidden email]>:
> 2) Is there a simple equivalent of sortrows(a) (i.e., sorting by entire
> rows)?  Similarly, is there a simple equivalent of the matlab Y =

have you looked at lexsort?

> 3) Is there an equivalent of [Y,I,J] = unique(X)?  In this case, I is
> the indices of the unique elements and J is an index from Y to X (i.e.,
> where the unique elements appear in X.  I can get I with:
>
> I,Y = unique1d( X, return_index=True )
>
> but J, which is very useful, is not available.  I suppose I could do:
>
> J = array([])
> for i,y in enumerate(Y):
> J[ nonzero( y == X ) ] = i
>
> But it seems like J is useful enough that there should be an easier way
> (or perhaps it should be integrated into unique1d, at the risk of adding
> more keyword arguments).

So basically Y = X[I] and X = Y[J], right? I do not recall matlab that  
well to know for sure. It certainly could be done, I could look at it  
after I return from Euroscipy (i.e. after Monday).

I would replace return_index argument by two arguments: return_direct  
(->I) and return_inverse (->J), ok? Does anyone propose better names?

Actually most of the methods in arraysetops could return optionally  
some index arrays. Would anyone use it? (I do not need it personally :)

cheers,
r.


_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

unique, sort, sortrows

David Kaplan-6
In reply to this post by David Kaplan-6
Hi,

Thanks for the very helpful comments.

Regarding Gael's comment, the problem with mgrid and ogrid (at least in
the version of numpy I use: 1.1.0-3) is that it currently only accepts
standard indexing so you can't use it with non-uniform values.  I use
this a lot when I have a model I want to run with a series of parameter
values, not all of which are uniformly spaced.  For example, the
following fails:

mgrid[:5,[1,7,8]]

I would like this to give something equivalent to
meshgrid(arange(5),[1,7,8]) (up to a transpose operation).  Similarly
for more input arguments.  I haven't looked at the numpy source code,
but it would seem that it shouldn't be too hard to add this
functionality as it already exists for r_ and c_:

r_[:5,[1,7,8]] # no problem here

In response to Robert's comments, I looked a bit at lexsort and didn't
immediately see how it could fix my problem of sorting rows of a matrix
because I didn't really understand it.  Finally I figured out that the
following appears to do the trick:

I = lexsort(a[:,-1::-1].T)
b = a[I,:]

As this is a bit tricky for someone to figure out, perhaps a helper
function called sortrows would be useful in numpy?  Also, an equivalent
of unique(a,'rows') would still be very useful.

As for [Y,I,J] = unique(X), yes Y=X[I] and X=Y[J].  Your fix would help
me out, though it would be nice to specify the call signature in the
help of the new version of unique1d (it took me a while to figure out
that it was I,Y and not Y,I).  I think it would also be useful to
propogate these changes to the other arraysetops functions.  In
particular, I use indexes returned by matlab's intersect command often:

[C,IA,IB] = intersect(A,B)

A use case for this is suppose you have a sparse dataset at x,y points
that is "on a grid", but lacking some of the points (e.g., instrument
only returns points that had valid data).  A simple way to solve this
would be (using a few suggested changes to numpy/scipy):

[X,Y] = mgrid[ unique(pts[:,0]), unique(pts[:,1]) ]
s = X.shape
newData = tile( NaN, s ).flatten()

p,IA,IB = intersect( c_[X.flatten(),Y.flatten()], pts, rows=True )
newData[IA] = Data[IB]
newData.reshape(s)

There may be other concise ways to solve this, but this one seems fairly
efficient.

Thanks again.

Cheers,
David

--
**********************************
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**********************************


_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: unique, sort, sortrows

Gael Varoquaux
On Sun, Jul 27, 2008 at 02:43:09PM +0200, David M. Kaplan wrote:
> I haven't looked at the numpy source code,
> but it would seem that it shouldn't be too hard to add this
> functionality as it already exists for r_ and c_:

> r_[:5,[1,7,8]] # no problem here

Well, if you have time to have a look at that, there certainly is value
in adding this functionnality.

Cheers,

Gaël
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

unique, sort, sortrows

David Kaplan-6
In reply to this post by David Kaplan-6
Hi,

Well, as usual there are compromises in everything and the mgrid/ogrid
functionality is the way it currently is for some good reasons.  The
first reason is that python appears to be fairly sloppy about how it
passes indexing arguments to the __getitem__ method.  It passes a tuple
containing the arguments in all cases except when it has one argument,
in which case it just passes that argument.  This means that it is hard
to tell a tuple argument from several non-tuple arguments.  For example,
the following two produce exactly the same call to __getitem__ :

mgrid[1,2,3]
mgrid[(1,2,3)]

(__getitem__ receives a single tuple (1,2,3)), but different from:

mgrid[[1,2,3]]

(__getitem__ receives a single list = [1,2,3]).  This seems like a bug
to me, but is probably considered a feature by somebody.  In any case,
this is workable, but a bit annoying in that tuple arguments just aren't
going to work well.

The second problem is that the current implementation is fairly
efficient because it forces all arguments to the same type so as to
avoid some unnecessary copies (I think).  Once you allow non-slice
arguments, this is hard to maintain.

That being said, attached is a replacement for index_tricks.py that
should implement a reasonable set of features, while only very slightly
altering performance.  I have only touched nd_grid.  I haven't fixed the
documentation string yet, nor have I added tests to
test_index_tricks.py, but will do that if the changes will be accepted
into numpy.  With the new version, old stuff should work as usual,
except that mgrid now returns a list of arrays instead of an array of
arrays (note that this change will cause current test_index_tricks.py to
fail).  With the new changes, you can now do:

mgrid[-2:5:10j,[4.5,6,7.1],12,['abc','def']]

The following will work as expected:

mgrid[:5,(1,2,3)]

But this will not:

mgrid[(1,2,3)]  # same as mgrid[1,2,3], but different from mgrid[[1,2,3]]

Given these limitations, this seems like a fairly useful addition.  If
this looks usable, I will clean up and add tests if desired.  If not, I
recommend adding a ndgrid function to numpy that does the equivalent of
matlab [X,Y,Z,...] = ndgrid(x,y,z,...) and then making the current
meshgrid just call that changing the order of the first two arguments.

Cheers,
David



--
**********************************
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**********************************


_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user

index_tricks.py (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: unique, sort, sortrows

Alan G Isaac
David M. Kaplan wrote:
> python appears to be fairly sloppy about how it passes
> indexing arguments to the __getitem__ method.

I do not generally find the word 'sloppy' to be descriptive
of Python.

> It passes a tuple containing the arguments in all cases
> except when it has one argument, in which case it just
> passes that argument.

Well, not quite.  The bracket syntax is for passing a key
(a single object) to __getitem__.

> For example, the following two produce exactly the same
> call to __getitem__ :
> mgrid[1,2,3]
> mgrid[(1,2,3)]

Well, yes.  Note::

     >>> x = 1,2,3
     >>> type(x)
     <type 'tuple'>

In Python it is the commas, not the paretheses,
that is determining the tuple type.

So perhaps the question you raise could be rephrased to
"why does an ndarray (not Python) handle treat
a list 'index' differently than a tuple 'index'?"
I do not know the history of that decision,
but it has been used to provide some additional functionality.

Cheers,
Alan Isaac

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

unique, sort, sortrows

David Kaplan-6
In reply to this post by David Kaplan-6
Hi,

>From Alan Isaac:

Well, yes.  Note::

     >>> x = 1,2,3
     >>> type(x)
     <type 'tuple'>

In Python it is the commas, not the paretheses,
that is determining the tuple type.

----

Good point.  In any case, this is a workable problem.  Adding a comma
after the last argument to mgrid[] assures that it behaves "as
expected" (e.g. mgrid[(1,2),]).

Attached is a newer much cleaner version of my replacement for
index_tricks.py.  I did some optimisation and it looks like this new
version beats the old on similar operations for large meshed matrices.
For single return arrays or small matrices it looses to the old version,
but only slightly.  As large matrices are probably the bottleneck, this
seems like a reasonable tradeoff.

Cheers,
David


--
**********************************
David M. Kaplan
Charge de Recherche 1
Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95
http://www.ur097.ird.fr/team/dkaplan/index.html
**********************************


_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user

index_tricks.py (14K) Download Attachment