[SciPy-User] How to handle a scipy.io.loadmat - related bug: parts of the data inaccessible after loadmat

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] How to handle a scipy.io.loadmat - related bug: parts of the data inaccessible after loadmat

Propadovic Nenad
Hello Matthew, hello Gregor,

thank you for your answers.

Yes, values struct_as_record=False and squeeze_me=True are what the workaround I mentioned in the initial post at:
http://stackoverflow.com/questions/7008608/scipy-io-loadmat-nested-structures-i-e-dictionaries/
is based on.

I will use that workaround, it gives nice access to everything I need.

Why I hesitated initially, after finding it was that setting struct_as_record=False boils down to using the class scipy.io.matlab.mio5_params.mat_struct.

When you look at the docstring of that class it says: "We deprecate this method of holding struct information, and will soon remove it, in favor of the recarray method".

So the way I understand this, I'm building my code upon something that won't be around for long. An uncomfortable situation, I don't know if you agree? I'm a consultant, the code will be around the clients place longer than myself. So this was the reason I started this thread in the first place. I really tried to explain that in the initial post, with all the necessary detail, but it seems that the post was too confusing. Sorry for bothering everybody for so long. I'm, however, still not sure if I should file this as a bug, or some kind of feature request, as there seem to be very few people who bother (I counted maybe 2-3 more related posts, plus the post containing the workaround, at stack overflow).

Regards,

Nenad


2017-02-24 12:15 GMT+01:00 <[hidden email]>:
Send SciPy-User mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scipy-user
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of SciPy-User digest..."


Today's Topics:

   1. Re: How to handle a scipy.io.loadmat - related bug: parts of
      the data inaccessible after loadmat (Matthew Brett)
   2. Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
      (Sema Atasever)
   3. Re: How to handle a scipy.io.loadmat - related bug: parts of
      the data inaccessible after loadmat (Gregor Thalhammer)


----------------------------------------------------------------------

Message: 1
Date: Thu, 23 Feb 2017 17:41:34 -0800
From: Matthew Brett <[hidden email]>
To: SciPy Users List <[hidden email]>
Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
        bug: parts of the data inaccessible after loadmat
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hi,

On Thu, Feb 23, 2017 at 5:12 AM, Propadovic Nenad <[hidden email]> wrote:
> Hello Jason, than you a lot for the answer to my post. I was aware of the
> squeeze_me=True option, I think I mentioned it in my initial question post.
>
> However, as I stated in the answer to Gregors kind answer, I actually need a
> way to inspect the parts of the access path in the data structure I import
> form the x.mat-file, and if I use squeeze_me=True, parts like 'RPDO2'
> disappear completely:
>
>
> import scipy.io
>
> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> cd = y['CanData']
> msg = cd['msg']
> print msg
>
> Output:
> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05, -0.1 ,
> 0.3 ]), u'VelAct')),
>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,
> 0.003968,  0.007978])),)
>
> And I really need to be able to find it by some kind of inspection, so that
> I don't return parts of the structure that don't correspond to the intention
> of the person searching.

Sorry if I'm not following, but, does this help?

In [36]: y = scipy.io.loadmat("x.mat")

In [37]: y['CanData'][0, 0]['msg'][0, 0]['RPDO2']
Out[37]:
array([[ (array([[ 0.      ,  0.003968,  0.007978]]), array([[
(array([[(array([[ 61.96,  61.96,  61.96]]), array(['PosAct'],
      dtype='<U6'))]],
      dtype=[('Values', 'O'), ('Name', 'O')]), array([[(array([[-0.05,
-0.1 ,  0.3 ]]), array(['VelAct'],
      dtype='<U6'))]],
      dtype=[('Values', 'O'), ('Name', 'O')]))]],
      dtype=[('PosAct', 'O'), ('VelAct', 'O')]))]],
      dtype=[('timest', 'O'), ('sig', 'O')])

In [54]: y2 = scipy.io.loadmat('x.mat', squeeze_me=True, struct_as_record=False)

In [55]: y2['CanData'].msg.RPDO2
Out[55]: <scipy.io.matlab.mio5_params.mat_struct at 0x10fda7b00>

Best,

Matthew


------------------------------

Message: 2
Date: Fri, 24 Feb 2017 09:59:04 +0300
From: Sema Atasever <[hidden email]>
To: [hidden email]
Subject: [SciPy-User] Use Distance Matrix in
        scipy.cluster.hierarchy.linkage()?
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Dear SciPy list member,

I want to ask you about clustering usign scipy.cluster.hierarchy.

I have a *distance matrix* n*n M where M_ij is the distance between
object_i and object_j. You can see file format in the attachment -->
(dm.csv)

I want to cluster these n objects with hierarchical clustering.

For this purpose i am usign this python code that you can see in the
attachment (scipy_code.py)

I want to ask that how can get clusters values in pdf format or text format
and how many clusters did i get and what clusters includes what members?

Thanks in Advance, Best regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/dbac91b0/attachment-0001.html>
-------------- next part --------------
import scipy
import pylab
import scipy.cluster.hierarchy as sch
import pandas as pd
import numpy as np

D=np.loadtxt(open("C:\dm.csv", "rb"), delimiter="\t", usecols=range(1,11))

print (D)
print (D.shape)

# Compute and plot dendrogram.
fig = pylab.figure()
axdendro = fig.add_axes([0.09,0.1,0.2,0.8])
Y = sch.linkage(D, method='single')
Z = sch.dendrogram(Y, orientation='right')
axdendro.set_xticks([])
axdendro.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.8])
index = Z['leaves']
D = D[index,:]
D = D[:,index]
im = axmatrix.matshow(D, aspect='auto', origin='lower')
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.8])
pylab.colorbar(im, cax=axcolor)

# Display and save figure.
fig.show()
fig.savefig('dendrogram.png')
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dm.csv
Type: text/csv
Size: 680 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/dbac91b0/attachment-0001.csv>

------------------------------

Message: 3
Date: Fri, 24 Feb 2017 12:15:18 +0100
From: Gregor Thalhammer <[hidden email]>
To: SciPy Users List <[hidden email]>
Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
        bug: parts of the data inaccessible after loadmat
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="us-ascii"


> Am 23.02.2017 um 14:12 schrieb Propadovic Nenad <[hidden email]>:
>
> Hello Jason, than you a lot for the answer to my post. I was aware of the squeeze_me=True option, I think I mentioned it in my initial question post.
>
> However, as I stated in the answer to Gregors kind answer, I actually need a way to inspect the parts of the access path in the data structure I import form the x.mat-file, and if I use squeeze_me=True, parts like 'RPDO2' disappear completely:
>

The substruct names are somewhat hidden, try this:

y = scipy.io.loadmat('x.mat', squeeze_me=True, struct_as_record=False)
y['CanData'].msg._fieldnames

['RPDO2']

Gregor

PS: for introspection of objects, just take a look at the __dict__ attribute.

>
> import scipy.io <http://scipy.io/>
>
> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> cd = y['CanData']
> msg = cd['msg']
> print msg
>
> Output:
> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05, -0.1 ,  0.3 ]), u'VelAct')),
>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,  0.003968,  0.007978])),)
>
> And I really need to be able to find it by some kind of inspection, so that I don't return parts of the structure that don't correspond to the intention of the person searching.
>
> Thanks once again!
>
> Nenad
>
>
>
> 2017-02-22 18:00 GMT+01:00 <[hidden email] <mailto:[hidden email]>>:
> Send SciPy-User mailing list submissions to
>         [hidden email] <mailto:[hidden email]>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> or, via email, send a message with subject or body 'help' to
>         [hidden email] <mailto:[hidden email]>
>
> You can reach the person managing the list at
>         [hidden email] <mailto:[hidden email]>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-User digest..."
>
>
> Today's Topics:
>
>    1. Re: How to handle a scipy.io.loadmat - related bug: parts of
>       the data inaccessible after loadmat (Jason Sachs)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Feb 2017 09:12:26 -0700
> From: Jason Sachs <[hidden email] <mailto:[hidden email]>>
> To: SciPy Users List <[hidden email] <mailto:[hidden email]>>
> Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
>         bug: parts of the data inaccessible after loadmat
> Message-ID:
>         <[hidden email] <mailto:[hidden email]>>
> Content-Type: text/plain; charset="utf-8"
>
> ah, yes, here it is:
>
> https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html <https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html>
>
> ----
>
> So, in MATLAB, the struct array must be at least 2D, and we replicate that
> when we read into Scipy. If you want all length 1 dimensions squeezed out,
> try this:
> >>>
>
> >>> mat_contents = sio.loadmat('octave_struct.mat', squeeze_me=True)>>> oct_struct = mat_contents['my_struct']>>> oct_struct.shape()
>
>
> On Wed, Feb 22, 2017 at 9:11 AM, Jason Sachs <[hidden email] <mailto:[hidden email]>> wrote:
>
> > This looks familiar, I ran into this a few years ago, and if I recall
> > correctly, there is an option to loadmat to reduce array dimensions
> > appropriately. There is a "squeeze_me" option (unfortunately named...
> > should probably be deprecated in favor of  "squeeze") which I think does
> > this.
> >
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html <https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html>
> >
> > On Wed, Feb 22, 2017 at 9:02 AM, Gregor Thalhammer <
> > [hidden email] <mailto:[hidden email]>> wrote:
> >
> >>
> >> Am 22.02.2017 um 12:02 schrieb Propadovic Nenad <[hidden email] <mailto:[hidden email]>>:
> >>
> >> Hello,
> >>
> >> bear with me for the long post that follows: it took me more than a week
> >> to get this far, and I tried to compress all the relevant information into
> >> the post.
> >>
> >> There seems to be a bug in scipy.io.loadmat; I'll present it by a short
> >> piece of code and it's output.
> >>
> >> I create file x.mat with the following:
> >>
> >> import scipy.io <http://scipy.io/>
> >>
> >> d = {'CanData':
> >>     {
> >>     'msg': {
> >>             'RPDO2': {
> >>                 'timest': [0.0, 0.0039679999899817631,
> >> 0.0079779999941820279],
> >>                 'sig': {
> >>                     'VelAct': {
> >>                         'Values': [-0.050000000000000003,
> >> -0.10000000000000001, 0.29999999999999999, ],
> >>                         'Name': 'VelAct'
> >>                     },
> >>                     'PosAct': {
> >>                         'Values': [61.960000000000001,
> >> 61.960000000000001, 61.960000000000001, ],
> >>                         'Name': 'PosAct'
> >>                     }
> >>                 }
> >>             }
> >>         }
> >>     }
> >> }
> >> scipy.io.savemat("x.mat", d)
> >>
> >> Matlab is happy with the file and handles it the way I expect.
> >>
> >> When I read in the data stored in the file and print it out:
> >>
> >> import scipy.io <http://scipy.io/>
> >> y = scipy.io.loadmat("x.mat")
> >> # print y
> >> cd = y['CanData']
> >> msg = cd['msg']
> >> print msg
> >> print msg.dtype
> >> print msg.dtype.names
> >>
> >> The output is:
> >> >C:\Anaconda2\pythonw -u "test1.py"
> >> [[ array([[ ([[(array([[ ([[(array([[ 61.96,  61.96,  61.96]]),
> >> array([u'PosAct'],
> >>       dtype='<U6'))]], [[(array([[-0.05, -0.1 ,  0.3 ]]),
> >> array([u'VelAct'],
> >>       dtype='<U6'))]])]],
> >>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([[ 0.      ,
> >> 0.003968,  0.007978]]))]],)]],
> >>       dtype=[('RPDO2', 'O')])]]
> >> object
> >> None
> >>
> >> Now  I've read the manual, and as I see it I have no way for me to access
> >> the deeper layers of data I just put in the file x.mat, although they are
> >> obviously right there in the data read in. Access via msg['RPDO2'] gives:
> >> IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis
> >> (`None`) and integer or boolean arrays are valid indices.
> >>
> >>
> >> For historic reasons, in Matlab everything is at least a 2D array, even
> >> scalars. By sprinkling some [0,0] in your code you should get what you
> >> want, e.g.
> >>
> >> msg[0,0]['RPDO2'][0,0]['timest'][0,0]
> >>
> >> array([[ 0.      ,  0.003968,  0.007978]])
> >>
> >>
> >> Gregor
> >>
> >>
> >>
> >> If I use parameter squeeze_me=True:
> >>
> >> scipy.io.savemat("x.mat", d)
> >> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> >> # print y
> >> cd = y['CanData']
> >> msg = cd['msg']
> >> print msg
> >> print msg.dtype
> >> print msg.dtype.names
> >>
> >> I get output:
> >> >C:\Anaconda2\pythonw -u "test1.py"
> >> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05,
> >> -0.1 ,  0.3 ]), u'VelAct')),
> >>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,
> >> 0.003968,  0.007978])),)
> >> object
> >> None
> >> >Exit code: 0
> >>
> >> All well, but the name 'RPDO2' disapeared from the data!
> >>
> >> Now I need this information; in future I won't handle what's put into
> >> x.mat, so I need a way to access through the data all the way down (and
> >> handle the variations that will come).
> >>
> >> I have found a workaround at:
> >> http://stackoverflow.com/questions/7008608/scipy-io-loadmat- <http://stackoverflow.com/questions/7008608/scipy-io-loadmat->
> >> nested-structures-i-e-dictionaries/
> >>
> >> The problem is, the workaround uses struct_as_record=False in loadmat,
> >> and which boils down to using scipy.io.matlab.mio5_params.mat_struct,
> >> and when you read the docstring of class mat_struct, it says:
> >>
> >> '''
> >> ...
> >> We deprecate this method of holding struct information, and will
> >> soon remove it, in favor of the recarray method (see loadmat
> >> docstring)
> >> '''
> >> So my questions:
> >> 1) Did I miss something? Is there a way to access the data in 'RPDO2' by
> >> using this name, without using parameter struct_as_record=False in loadmat?
> >> 2) If not, where do I file a bug? The workaround is five years old, so
> >> the issue seems to be in scipy for ages...
> >>
> >> (For the records, I use scipy within Anaconda2 1.4.1, under Windows, but
> >> this does not seem to matter).
> >>
> >> Thanks a lot for the answers, in advance.
> >>
> >> Nenad
> >>
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email] <mailto:[hidden email]>
> >> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> >>
> >>
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email] <mailto:[hidden email]>
> >> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> >>
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170222/e992b074/attachment-0001.html <http://mail.python.org/pipermail/scipy-user/attachments/20170222/e992b074/attachment-0001.html>>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email] <mailto:[hidden email]>
> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
>
>
> ------------------------------
>
> End of SciPy-User Digest, Vol 162, Issue 6
> ******************************************
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/46a54c05/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user


------------------------------

End of SciPy-User Digest, Vol 162, Issue 10
*******************************************


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a scipy.io.loadmat - related bug: parts of the data inaccessible after loadmat

Gregor Thalhammer-2

Am 24.02.2017 um 13:36 schrieb Propadovic Nenad <[hidden email]>:

Hello Matthew, hello Gregor,

thank you for your answers.

Yes, values struct_as_record=False and squeeze_me=True are what the workaround I mentioned in the initial post at:
http://stackoverflow.com/questions/7008608/scipy-io-loadmat-nested-structures-i-e-dictionaries/
is based on.

I will use that workaround, it gives nice access to everything I need.

Why I hesitated initially, after finding it was that setting struct_as_record=False boils down to using the class scipy.io.matlab.mio5_params.mat_struct.

When you look at the docstring of that class it says: "We deprecate this method of holding struct information, and will soon remove it, in favor of the recarray method“.

This comment has been added 7 years ago, but the mat_struct is still there (but it is not used anymore with default settings). So I think it cannot be taken seriously. Perhaps you might file an issue for removing or altering this remark.

Anyhow, the field names are accessible by

y = scipy.io.loadmat("x.mat")
y['CanData'][0, 0]['msg'].dtype.names

('RPDO2',)

or

y = scipy.io.loadmat("x.mat", squeeze_me=True)
y['CanData']['msg'].item().dtype.names



So the way I understand this, I'm building my code upon something that won't be around for long. An uncomfortable situation, I don't know if you agree? I'm a consultant, the code will be around the clients place longer than myself. So this was the reason I started this thread in the first place. I really tried to explain that in the initial post, with all the necessary detail, but it seems that the post was too confusing. Sorry for bothering everybody for so long. I'm, however, still not sure if I should file this as a bug, or some kind of feature request, as there seem to be very few people who bother (I counted maybe 2-3 more related posts, plus the post containing the workaround, at stack overflow).

I would not consider this as a bug, more a documentation issue, which is resolved by this discussion. You might add a comment on stack overflow with your findings.

Gregor




Regards,

Nenad


2017-02-24 12:15 GMT+01:00 <[hidden email]>:
Send SciPy-User mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scipy-user
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of SciPy-User digest..."


Today's Topics:

   1. Re: How to handle a scipy.io.loadmat - related bug: parts of
      the data inaccessible after loadmat (Matthew Brett)
   2. Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
      (Sema Atasever)
   3. Re: How to handle a scipy.io.loadmat - related bug: parts of
      the data inaccessible after loadmat (Gregor Thalhammer)


----------------------------------------------------------------------

Message: 1
Date: Thu, 23 Feb 2017 17:41:34 -0800
From: Matthew Brett <[hidden email]>
To: SciPy Users List <[hidden email]>
Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
        bug: parts of the data inaccessible after loadmat
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hi,

On Thu, Feb 23, 2017 at 5:12 AM, Propadovic Nenad <[hidden email]> wrote:
> Hello Jason, than you a lot for the answer to my post. I was aware of the
> squeeze_me=True option, I think I mentioned it in my initial question post.
>
> However, as I stated in the answer to Gregors kind answer, I actually need a
> way to inspect the parts of the access path in the data structure I import
> form the x.mat-file, and if I use squeeze_me=True, parts like 'RPDO2'
> disappear completely:
>
>
> import scipy.io
>
> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> cd = y['CanData']
> msg = cd['msg']
> print msg
>
> Output:
> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05, -0.1 ,
> 0.3 ]), u'VelAct')),
>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,
> 0.003968,  0.007978])),)
>
> And I really need to be able to find it by some kind of inspection, so that
> I don't return parts of the structure that don't correspond to the intention
> of the person searching.

Sorry if I'm not following, but, does this help?

In [36]: y = scipy.io.loadmat("x.mat")

In [37]: y['CanData'][0, 0]['msg'][0, 0]['RPDO2']
Out[37]:
array([[ (array([[ 0.      ,  0.003968,  0.007978]]), array([[
(array([[(array([[ 61.96,  61.96,  61.96]]), array(['PosAct'],
      dtype='<U6'))]],
      dtype=[('Values', 'O'), ('Name', 'O')]), array([[(array([[-0.05,
-0.1 ,  0.3 ]]), array(['VelAct'],
      dtype='<U6'))]],
      dtype=[('Values', 'O'), ('Name', 'O')]))]],
      dtype=[('PosAct', 'O'), ('VelAct', 'O')]))]],
      dtype=[('timest', 'O'), ('sig', 'O')])

In [54]: y2 = scipy.io.loadmat('x.mat', squeeze_me=True, struct_as_record=False)

In [55]: y2['CanData'].msg.RPDO2
Out[55]: <scipy.io.matlab.mio5_params.mat_struct at 0x10fda7b00>

Best,

Matthew


------------------------------

Message: 2
Date: Fri, 24 Feb 2017 09:59:04 +0300
From: Sema Atasever <[hidden email]>
To: [hidden email]
Subject: [SciPy-User] Use Distance Matrix in
        scipy.cluster.hierarchy.linkage()?
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Dear SciPy list member,

I want to ask you about clustering usign scipy.cluster.hierarchy.

I have a *distance matrix* n*n M where M_ij is the distance between
object_i and object_j. You can see file format in the attachment -->
(dm.csv)

I want to cluster these n objects with hierarchical clustering.

For this purpose i am usign this python code that you can see in the
attachment (scipy_code.py)

I want to ask that how can get clusters values in pdf format or text format
and how many clusters did i get and what clusters includes what members?

Thanks in Advance, Best regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/dbac91b0/attachment-0001.html>
-------------- next part --------------
import scipy
import pylab
import scipy.cluster.hierarchy as sch
import pandas as pd
import numpy as np

D=np.loadtxt(open("C:\dm.csv", "rb"), delimiter="\t", usecols=range(1,11))

print (D)
print (D.shape)

# Compute and plot dendrogram.
fig = pylab.figure()
axdendro = fig.add_axes([0.09,0.1,0.2,0.8])
Y = sch.linkage(D, method='single')
Z = sch.dendrogram(Y, orientation='right')
axdendro.set_xticks([])
axdendro.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.8])
index = Z['leaves']
D = D[index,:]
D = D[:,index]
im = axmatrix.matshow(D, aspect='auto', origin='lower')
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.8])
pylab.colorbar(im, cax=axcolor)

# Display and save figure.
fig.show()
fig.savefig('dendrogram.png')
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dm.csv
Type: text/csv
Size: 680 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/dbac91b0/attachment-0001.csv>

------------------------------

Message: 3
Date: Fri, 24 Feb 2017 12:15:18 +0100
From: Gregor Thalhammer <[hidden email]>
To: SciPy Users List <[hidden email]>
Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
        bug: parts of the data inaccessible after loadmat
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="us-ascii"


> Am 23.02.2017 um 14:12 schrieb Propadovic Nenad <[hidden email]>:
>
> Hello Jason, than you a lot for the answer to my post. I was aware of the squeeze_me=True option, I think I mentioned it in my initial question post.
>
> However, as I stated in the answer to Gregors kind answer, I actually need a way to inspect the parts of the access path in the data structure I import form the x.mat-file, and if I use squeeze_me=True, parts like 'RPDO2' disappear completely:
>

The substruct names are somewhat hidden, try this:

y = scipy.io.loadmat('x.mat', squeeze_me=True, struct_as_record=False)
y['CanData'].msg._fieldnames

['RPDO2']

Gregor

PS: for introspection of objects, just take a look at the __dict__ attribute.

>
> import scipy.io <http://scipy.io/>
>
> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> cd = y['CanData']
> msg = cd['msg']
> print msg
>
> Output:
> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05, -0.1 ,  0.3 ]), u'VelAct')),
>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,  0.003968,  0.007978])),)
>
> And I really need to be able to find it by some kind of inspection, so that I don't return parts of the structure that don't correspond to the intention of the person searching.
>
> Thanks once again!
>
> Nenad
>
>
>
> 2017-02-22 18:00 GMT+01:00 <[hidden email] <mailto:[hidden email]>>:
> Send SciPy-User mailing list submissions to
>         [hidden email] <mailto:[hidden email]>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> or, via email, send a message with subject or body 'help' to
>         [hidden email] <mailto:[hidden email]>
>
> You can reach the person managing the list at
>         [hidden email] <mailto:[hidden email]>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-User digest..."
>
>
> Today's Topics:
>
>    1. Re: How to handle a scipy.io.loadmat - related bug: parts of
>       the data inaccessible after loadmat (Jason Sachs)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Feb 2017 09:12:26 -0700
> From: Jason Sachs <[hidden email] <mailto:[hidden email]>>
> To: SciPy Users List <[hidden email] <mailto:[hidden email]>>
> Subject: Re: [SciPy-User] How to handle a scipy.io.loadmat - related
>         bug: parts of the data inaccessible after loadmat
> Message-ID:
>         <[hidden email] <mailto:[hidden email]>>
> Content-Type: text/plain; charset="utf-8"
>
> ah, yes, here it is:
>
> https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html <https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html>
>
> ----
>
> So, in MATLAB, the struct array must be at least 2D, and we replicate that
> when we read into Scipy. If you want all length 1 dimensions squeezed out,
> try this:
> >>>
>
> >>> mat_contents = sio.loadmat('octave_struct.mat', squeeze_me=True)>>> oct_struct = mat_contents['my_struct']>>> oct_struct.shape()
>
>
> On Wed, Feb 22, 2017 at 9:11 AM, Jason Sachs <[hidden email] <mailto:[hidden email]>> wrote:
>
> > This looks familiar, I ran into this a few years ago, and if I recall
> > correctly, there is an option to loadmat to reduce array dimensions
> > appropriately. There is a "squeeze_me" option (unfortunately named...
> > should probably be deprecated in favor of  "squeeze") which I think does
> > this.
> >
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html <https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html>
> >
> > On Wed, Feb 22, 2017 at 9:02 AM, Gregor Thalhammer <
> > [hidden email] <mailto:[hidden email]>> wrote:
> >
> >>
> >> Am 22.02.2017 um 12:02 schrieb Propadovic Nenad <[hidden email] <mailto:[hidden email]>>:
> >>
> >> Hello,
> >>
> >> bear with me for the long post that follows: it took me more than a week
> >> to get this far, and I tried to compress all the relevant information into
> >> the post.
> >>
> >> There seems to be a bug in scipy.io.loadmat; I'll present it by a short
> >> piece of code and it's output.
> >>
> >> I create file x.mat with the following:
> >>
> >> import scipy.io <http://scipy.io/>
> >>
> >> d = {'CanData':
> >>     {
> >>     'msg': {
> >>             'RPDO2': {
> >>                 'timest': [0.0, 0.0039679999899817631,
> >> 0.0079779999941820279],
> >>                 'sig': {
> >>                     'VelAct': {
> >>                         'Values': [-0.050000000000000003,
> >> -0.10000000000000001, 0.29999999999999999, ],
> >>                         'Name': 'VelAct'
> >>                     },
> >>                     'PosAct': {
> >>                         'Values': [61.960000000000001,
> >> 61.960000000000001, 61.960000000000001, ],
> >>                         'Name': 'PosAct'
> >>                     }
> >>                 }
> >>             }
> >>         }
> >>     }
> >> }
> >> scipy.io.savemat("x.mat", d)
> >>
> >> Matlab is happy with the file and handles it the way I expect.
> >>
> >> When I read in the data stored in the file and print it out:
> >>
> >> import scipy.io <http://scipy.io/>
> >> y = scipy.io.loadmat("x.mat")
> >> # print y
> >> cd = y['CanData']
> >> msg = cd['msg']
> >> print msg
> >> print msg.dtype
> >> print msg.dtype.names
> >>
> >> The output is:
> >> >C:\Anaconda2\pythonw -u "test1.py"
> >> [[ array([[ ([[(array([[ ([[(array([[ 61.96,  61.96,  61.96]]),
> >> array([u'PosAct'],
> >>       dtype='<U6'))]], [[(array([[-0.05, -0.1 ,  0.3 ]]),
> >> array([u'VelAct'],
> >>       dtype='<U6'))]])]],
> >>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([[ 0.      ,
> >> 0.003968,  0.007978]]))]],)]],
> >>       dtype=[('RPDO2', 'O')])]]
> >> object
> >> None
> >>
> >> Now  I've read the manual, and as I see it I have no way for me to access
> >> the deeper layers of data I just put in the file x.mat, although they are
> >> obviously right there in the data read in. Access via msg['RPDO2'] gives:
> >> IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis
> >> (`None`) and integer or boolean arrays are valid indices.
> >>
> >>
> >> For historic reasons, in Matlab everything is at least a 2D array, even
> >> scalars. By sprinkling some [0,0] in your code you should get what you
> >> want, e.g.
> >>
> >> msg[0,0]['RPDO2'][0,0]['timest'][0,0]
> >>
> >> array([[ 0.      ,  0.003968,  0.007978]])
> >>
> >>
> >> Gregor
> >>
> >>
> >>
> >> If I use parameter squeeze_me=True:
> >>
> >> scipy.io.savemat("x.mat", d)
> >> y = scipy.io.loadmat("x.mat", squeeze_me=True)
> >> # print y
> >> cd = y['CanData']
> >> msg = cd['msg']
> >> print msg
> >> print msg.dtype
> >> print msg.dtype.names
> >>
> >> I get output:
> >> >C:\Anaconda2\pythonw -u "test1.py"
> >> ((array(((array([ 61.96,  61.96,  61.96]), u'PosAct'), (array([-0.05,
> >> -0.1 ,  0.3 ]), u'VelAct')),
> >>       dtype=[('PosAct', 'O'), ('VelAct', 'O')]), array([ 0.      ,
> >> 0.003968,  0.007978])),)
> >> object
> >> None
> >> >Exit code: 0
> >>
> >> All well, but the name 'RPDO2' disapeared from the data!
> >>
> >> Now I need this information; in future I won't handle what's put into
> >> x.mat, so I need a way to access through the data all the way down (and
> >> handle the variations that will come).
> >>
> >> I have found a workaround at:
> >> http://stackoverflow.com/questions/7008608/scipy-io-loadmat- <http://stackoverflow.com/questions/7008608/scipy-io-loadmat->
> >> nested-structures-i-e-dictionaries/
> >>
> >> The problem is, the workaround uses struct_as_record=False in loadmat,
> >> and which boils down to using scipy.io.matlab.mio5_params.mat_struct,
> >> and when you read the docstring of class mat_struct, it says:
> >>
> >> '''
> >> ...
> >> We deprecate this method of holding struct information, and will
> >> soon remove it, in favor of the recarray method (see loadmat
> >> docstring)
> >> '''
> >> So my questions:
> >> 1) Did I miss something? Is there a way to access the data in 'RPDO2' by
> >> using this name, without using parameter struct_as_record=False in loadmat?
> >> 2) If not, where do I file a bug? The workaround is five years old, so
> >> the issue seems to be in scipy for ages...
> >>
> >> (For the records, I use scipy within Anaconda2 1.4.1, under Windows, but
> >> this does not seem to matter).
> >>
> >> Thanks a lot for the answers, in advance.
> >>
> >> Nenad
> >>
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email] <mailto:[hidden email]>
> >> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> >>
> >>
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email] <mailto:[hidden email]>
> >> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
> >>
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170222/e992b074/attachment-0001.html <http://mail.python.org/pipermail/scipy-user/attachments/20170222/e992b074/attachment-0001.html>>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email] <mailto:[hidden email]>
> https://mail.python.org/mailman/listinfo/scipy-user <https://mail.python.org/mailman/listinfo/scipy-user>
>
>
> ------------------------------
>
> End of SciPy-User Digest, Vol 162, Issue 6
> ******************************************
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/scipy-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20170224/46a54c05/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user


------------------------------

End of SciPy-User Digest, Vol 162, Issue 10
*******************************************

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a scipy.io.loadmat - related bug: parts of the data inaccessible after loadmat

Evgeni Burovski


>> Why I hesitated initially, after finding it was that setting struct_as_record=False boils down to using the class scipy.io.matlab.mio5_params.mat_struct.
>>
>> When you look at the docstring of that class it says: "We deprecate this method of holding struct information, and will soon remove it, in favor of the recarray method“.
>
>
> This comment has been added 7 years ago, but the mat_struct is still there (but it is not used anymore with default settings). So I think it cannot be taken seriously. Perhaps you might file an issue for removing or altering this remark.

Yes please send a PR with the documentation fix!


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a scipy.io.loadmat - related bug: parts of the data inaccessible after loadmat

Matthew Brett
In reply to this post by Propadovic Nenad
Hi,

On Fri, Feb 24, 2017 at 4:36 AM, Propadovic Nenad <[hidden email]> wrote:

> Hello Matthew, hello Gregor,
>
> thank you for your answers.
>
> Yes, values struct_as_record=False and squeeze_me=True are what the
> workaround I mentioned in the initial post at:
>
> http://stackoverflow.com/questions/7008608/scipy-io-loadmat-nested-structures-i-e-dictionaries/
>
> is based on.
>
> I will use that workaround, it gives nice access to everything I need.

Sure - the other option you have is to inspect the field names of the
dtype to find the structure variable names.

> Why I hesitated initially, after finding it was that setting
> struct_as_record=False boils down to using the class
> scipy.io.matlab.mio5_params.mat_struct.
>
> When you look at the docstring of that class it says: "We deprecate this
> method of holding struct information, and will soon remove it, in favor of
> the recarray method".

Yes, sorry, I should have replied to that.  I will remove that
warning, it's become clear over time that the mini class
representation that you are using does have a place, so I don't think
we should plan to remove it:

https://github.com/scipy/scipy/pull/7090

Best,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/scipy-user