[SciPy-User] scipy.signal.resample muffs my timestamps?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] scipy.signal.resample muffs my timestamps?

Skip Montanaro
I want to resample a large (400k+) dataset where x are datetime
objects and y are floats. The x data are epoch seconds from the past
week. For the purposes of this example, I've crudely downsampled them,
choosing every 10 elements (Python prompt changed to "... " to fool
Gmane).

... len(t)
43051
... len(x)
43051
... pprint([datetime.datetime.fromtimestamp(_) for _ in t[:10]])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 12, 0, 0, 46, 742044),
 datetime.datetime(2015, 1, 12, 0, 1, 3, 320089),
 datetime.datetime(2015, 1, 12, 0, 1, 23, 700560),
 datetime.datetime(2015, 1, 12, 0, 1, 44, 583401),
 datetime.datetime(2015, 1, 12, 0, 1, 57, 733937),
 datetime.datetime(2015, 1, 12, 0, 2, 38, 30245),
 datetime.datetime(2015, 1, 12, 0, 3, 35, 336342),
 datetime.datetime(2015, 1, 12, 0, 4, 23, 833251),
 datetime.datetime(2015, 1, 12, 0, 4, 48, 272131)]
... pprint([datetime.datetime.fromtimestamp(_) for _ in t[-10:]])
[datetime.datetime(2015, 1, 19, 23, 56, 9, 996926),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 104080),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 158963),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 280701),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 337853),
 datetime.datetime(2015, 1, 19, 23, 56, 22, 169709),
 datetime.datetime(2015, 1, 19, 23, 56, 29, 676865),
 datetime.datetime(2015, 1, 19, 23, 57, 14, 570601),
 datetime.datetime(2015, 1, 19, 23, 58, 56, 394975),
 datetime.datetime(2015, 1, 19, 23, 59, 37, 707367)]

So, let's get started, downsampling our 43k points to 250:

... res_x, res_t = signal.resample(x, 250, t)
(Final Jeopardy tune plays...)
...

If I understand correctly, signal.resample should generate 250 evenly
spaced points from each of the inputs.

... len(res_x)
250
... len(res_t)
250

So far, so good. Now, look at the range of res_t:

... pprint([datetime.datetime.fromtimestamp(_) for _ in res_t[:10]])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 12, 2, 14, 9, 166940),
 datetime.datetime(2015, 1, 12, 4, 28, 18, 333880),
 datetime.datetime(2015, 1, 12, 6, 42, 27, 500820),
 datetime.datetime(2015, 1, 12, 8, 56, 36, 667761),
 datetime.datetime(2015, 1, 12, 11, 10, 45, 834701),
 datetime.datetime(2015, 1, 12, 13, 24, 55, 1641),
 datetime.datetime(2015, 1, 12, 15, 39, 4, 168581),
 datetime.datetime(2015, 1, 12, 17, 53, 13, 335521),
 datetime.datetime(2015, 1, 12, 20, 7, 22, 502461)]
... pprint([datetime.datetime.fromtimestamp(_) for _ in res_t[-10:]])
[datetime.datetime(2015, 2, 3, 8, 36, 40, 65638),
 datetime.datetime(2015, 2, 3, 10, 50, 49, 232578),
 datetime.datetime(2015, 2, 3, 13, 4, 58, 399518),
 datetime.datetime(2015, 2, 3, 15, 19, 7, 566458),
 datetime.datetime(2015, 2, 3, 17, 33, 16, 733398),
 datetime.datetime(2015, 2, 3, 19, 47, 25, 900338),
 datetime.datetime(2015, 2, 3, 22, 1, 35, 67279),
 datetime.datetime(2015, 2, 4, 0, 15, 44, 234219),
 datetime.datetime(2015, 2, 4, 2, 29, 53, 401159),
 datetime.datetime(2015, 2, 4, 4, 44, 2, 568099)]

That doesn't look right at all.

I'm sure I'm using an outdated version of scipy:

... scipy.version.version
'0.9.0'

but it's what I have available (it's a long story).

If this is a bug requiring upgrade, I'll beat on the powers that be to
get a newer version of scipy. I'm happy to provide my data to anyone
who would be willing to try this exercise out using a more recent
version.

Thanks,

Skip Montanaro


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: scipy.signal.resample muffs my timestamps?

ralfgommers


On Tue, Jan 20, 2015 at 6:14 PM, Skip Montanaro <[hidden email]> wrote:
I want to resample a large (400k+) dataset where x are datetime
objects and y are floats. The x data are epoch seconds from the past
week. For the purposes of this example, I've crudely downsampled them,
choosing every 10 elements (Python prompt changed to "... " to fool
Gmane).

... len(t)
43051
... len(x)
43051
... pprint([datetime.datetime.fromtimestamp(_) for _ in t[:10]])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 12, 0, 0, 46, 742044),
 datetime.datetime(2015, 1, 12, 0, 1, 3, 320089),
 datetime.datetime(2015, 1, 12, 0, 1, 23, 700560),
 datetime.datetime(2015, 1, 12, 0, 1, 44, 583401),
 datetime.datetime(2015, 1, 12, 0, 1, 57, 733937),
 datetime.datetime(2015, 1, 12, 0, 2, 38, 30245),
 datetime.datetime(2015, 1, 12, 0, 3, 35, 336342),
 datetime.datetime(2015, 1, 12, 0, 4, 23, 833251),
 datetime.datetime(2015, 1, 12, 0, 4, 48, 272131)]
... pprint([datetime.datetime.fromtimestamp(_) for _ in t[-10:]])
[datetime.datetime(2015, 1, 19, 23, 56, 9, 996926),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 104080),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 158963),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 280701),
 datetime.datetime(2015, 1, 19, 23, 56, 12, 337853),
 datetime.datetime(2015, 1, 19, 23, 56, 22, 169709),
 datetime.datetime(2015, 1, 19, 23, 56, 29, 676865),
 datetime.datetime(2015, 1, 19, 23, 57, 14, 570601),
 datetime.datetime(2015, 1, 19, 23, 58, 56, 394975),
 datetime.datetime(2015, 1, 19, 23, 59, 37, 707367)]

So, let's get started, downsampling our 43k points to 250:

... res_x, res_t = signal.resample(x, 250, t)
(Final Jeopardy tune plays...)
...

If I understand correctly, signal.resample should generate 250 evenly
spaced points from each of the inputs.

... len(res_x)
250
... len(res_t)
250

So far, so good. Now, look at the range of res_t:

... pprint([datetime.datetime.fromtimestamp(_) for _ in res_t[:10]])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 12, 2, 14, 9, 166940),
 datetime.datetime(2015, 1, 12, 4, 28, 18, 333880),
 datetime.datetime(2015, 1, 12, 6, 42, 27, 500820),
 datetime.datetime(2015, 1, 12, 8, 56, 36, 667761),
 datetime.datetime(2015, 1, 12, 11, 10, 45, 834701),
 datetime.datetime(2015, 1, 12, 13, 24, 55, 1641),
 datetime.datetime(2015, 1, 12, 15, 39, 4, 168581),
 datetime.datetime(2015, 1, 12, 17, 53, 13, 335521),
 datetime.datetime(2015, 1, 12, 20, 7, 22, 502461)]
... pprint([datetime.datetime.fromtimestamp(_) for _ in res_t[-10:]])
[datetime.datetime(2015, 2, 3, 8, 36, 40, 65638),
 datetime.datetime(2015, 2, 3, 10, 50, 49, 232578),
 datetime.datetime(2015, 2, 3, 13, 4, 58, 399518),
 datetime.datetime(2015, 2, 3, 15, 19, 7, 566458),
 datetime.datetime(2015, 2, 3, 17, 33, 16, 733398),
 datetime.datetime(2015, 2, 3, 19, 47, 25, 900338),
 datetime.datetime(2015, 2, 3, 22, 1, 35, 67279),
 datetime.datetime(2015, 2, 4, 0, 15, 44, 234219),
 datetime.datetime(2015, 2, 4, 2, 29, 53, 401159),
 datetime.datetime(2015, 2, 4, 4, 44, 2, 568099)]

That doesn't look right at all.

I'm sure I'm using an outdated version of scipy:

... scipy.version.version
'0.9.0'

but it's what I have available (it's a long story).

If this is a bug requiring upgrade, I'll beat on the powers that be to
get a newer version of scipy. I'm happy to provide my data to anyone
who would be willing to try this exercise out using a more recent
version.

I doubt that an upgrade will fix your issue; I don't see any bug fixes to signal.resample since 0.9.0 that look relevant. I don't understand that this works for you at all, a quick test with ``t = list_of_datetimes`` gives me:

    TypeError: unsupported operand type(s) for /: 'datetime.timedelta' and 'float'

If you can provide a reproducible example on a generated set of data, that would be the easiest (we can use that as a regression test). Otherwise providing your code with your actual dataset is also OK - if you send me a link or email it to me I'll have a look.

Cheers,
Ralf



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: scipy.signal.resample muffs my timestamps?

Skip Montanaro
In reply to this post by Skip Montanaro
I managed to download, build and install scipy 0.15.1.  I get a
similar (though quantitatively different) result.

>>> len(t)
430509
>>> len(x)
430509
>>> res_x, res_t = signal.resample(x[::100], 250, t[::100])
>>> len(res_x)
250
>>> len(res_t)
250
>>> t[-1]
1421733595.509921
>>> res_t[-1]
1422456460.5224724
>>> pprint([datetime.datetime.fromtimestamp(t[0]),
            datetime.datetime.fromtimestamp(t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 19, 23, 59, 55, 509921)]
>>> pprint([datetime.datetime.fromtimestamp(res_t[0]),
            datetime.datetime.fromtimestamp(res_t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 28, 8, 47, 40, 522472)]

I assume I'm doing something wrong to cause it to expand the range
like that. I didn't see any arguments in the help() output which
obviously suggested I could change this particular behavior though.

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: scipy.signal.resample muffs my timestamps?

josef.pktd


On Tue, Jan 20, 2015 at 2:36 PM, Skip Montanaro <[hidden email]> wrote:
I managed to download, build and install scipy 0.15.1.  I get a
similar (though quantitatively different) result.

>>> len(t)
430509
>>> len(x)
430509
>>> res_x, res_t = signal.resample(x[::100], 250, t[::100])
>>> len(res_x)
250
>>> len(res_t)
250
>>> t[-1]
1421733595.509921
>>> res_t[-1]
1422456460.5224724
>>> pprint([datetime.datetime.fromtimestamp(t[0]),
            datetime.datetime.fromtimestamp(t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 19, 23, 59, 55, 509921)]
>>> pprint([datetime.datetime.fromtimestamp(res_t[0]),
            datetime.datetime.fromtimestamp(res_t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 28, 8, 47, 40, 522472)]

I assume I'm doing something wrong to cause it to expand the range
like that. I didn't see any arguments in the help() output which
obviously suggested I could change this particular behavior though.


In case it's rounding issues (my guess), you could try to subtract t[0] from t, and add it again after the resample. 
There is a small chance it helps.

Josef
 

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: scipy.signal.resample muffs my timestamps?

Warren Weckesser-2
In reply to this post by Skip Montanaro


On Tue, Jan 20, 2015 at 2:36 PM, Skip Montanaro <[hidden email]> wrote:
I managed to download, build and install scipy 0.15.1.  I get a
similar (though quantitatively different) result.

>>> len(t)
430509
>>> len(x)
430509
>>> res_x, res_t = signal.resample(x[::100], 250, t[::100])
>>> len(res_x)
250
>>> len(res_t)
250
>>> t[-1]
1421733595.509921
>>> res_t[-1]
1422456460.5224724
>>> pprint([datetime.datetime.fromtimestamp(t[0]),
            datetime.datetime.fromtimestamp(t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 19, 23, 59, 55, 509921)]
>>> pprint([datetime.datetime.fromtimestamp(res_t[0]),
            datetime.datetime.fromtimestamp(res_t[-1])])
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 28, 8, 47, 40, 522472)]

I assume I'm doing something wrong to cause it to expand the range
like that. I didn't see any arguments in the help() output which
obviously suggested I could change this particular behavior though.



`resample` assumes the samples are uniformly spaced, but your timestamps are not.

Here are your first timestamps (from your first email):

In [40]: t
Out[40]:
[datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 12, 0, 0, 46, 742044),
 datetime.datetime(2015, 1, 12, 0, 1, 3, 320089),
 datetime.datetime(2015, 1, 12, 0, 1, 23, 700560),
 datetime.datetime(2015, 1, 12, 0, 1, 44, 583401),
 datetime.datetime(2015, 1, 12, 0, 1, 57, 733937),
 datetime.datetime(2015, 1, 12, 0, 2, 38, 30245),
 datetime.datetime(2015, 1, 12, 0, 3, 35, 336342),
 datetime.datetime(2015, 1, 12, 0, 4, 23, 833251),
 datetime.datetime(2015, 1, 12, 0, 4, 48, 272131)]


`dt` holds the intervals between each timestamp.  For `resample` to work as expected, these should all be the same:


In [41]: dt = np.array([delta.total_seconds() for delta in np.diff(d)])

In [42]: dt
Out[42]:
array([ 46.742044,  16.578045,  20.380471,  20.882841,  13.150536,
        40.296308,  57.306097,  48.496909,  24.43888 ])


By the way, it might be just luck that `resample` didn't crash when given a sequence of `datetime.datetime` objects for `t`.  I don't think any of the functions in scipy.signal were explicitly designed to handle `datetime` objects.  (There are no tests of such input in the test suite.)

In this case, it "works" because of the formula used to create the new time values.  Because `resample` assumes the input is uniformly sampled, it needs only the first time difference to figure out the new timestamps.  Here's how the new time values are computed in `resample` (`Nx` and `num` are the old and new number of samples, respectively):

        new_t = arange(0, num) * (t[1] - t[0]) * Nx / float(num) + t[0]

I.e.
        new_t = arange(0, num) * new_dt + t[0]
where
        new_dt = (t[1] - t[0]) * Nx / float(num)

`t[1] - t[0]` is a `datetime.timedelta` object, and `new_t` ends up as an array (with object dtype) of `datetime.datetime` instances.


Warren

 
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user