Status of TimeSeries SciKit

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Status of TimeSeries SciKit

Paul Bilokon
Hi,

I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?

Best wishes,
Paul

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Pierre GM-2

On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote:

> Hi,
>
> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?

Years is an overstatement...
The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and  unfortunately don't  currently have the time to keep it up-to-date.
*If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited.
That doesn't mean that you'd be on your own, questions will still be answered...
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote:

>
> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote:
>
>> Hi,
>>
>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?
>
> Years is an overstatement...
> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and  unfortunately don't  currently have the time to keep it up-to-date.
> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited.
> That doesn't mean that you'd be on your own, questions will still be answered...
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

hi Paul,

Skipper and I (statsmodels) relatively recently discussed moving
scikits.timeseries to GitHub and maintaining it there since we work on
models for time series analysis. I work very actively on time
series-related functionality in pandas so it might not even be
unthinkable to merge together the projects (scikits.timeseries and
pandas) and integrate all the numpy.datetime64 stuff once the dust
settles there. Just thinking out loud.

- Wes
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Pierre GM-2

On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote:

> On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote:
>>
>> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote:
>>
>>> Hi,
>>>
>>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?
>>
>> Years is an overstatement...
>> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and  unfortunately don't  currently have the time to keep it up-to-date.
>> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited.
>> That doesn't mean that you'd be on your own, questions will still be answered...
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> hi Paul,
>
> Skipper and I (statsmodels) relatively recently discussed moving
> scikits.timeseries to GitHub and maintaining it there since we work on
> models for time series analysis.

Er…
https://github.com/pierregm/scikits.timeseries/
https://github.com/pierregm/scikits.timeseries-sandbox/

the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray).



> I work very actively on time
> series-related functionality in pandas so it might not even be
> unthinkable to merge together the projects (scikits.timeseries and
> pandas) and integrate all the numpy.datetime64 stuff once the dust
> settles there. Just thinking out loud.

That's an idea.


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

jseabold
On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM <[hidden email]> wrote:

>
> On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote:
>
> > On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote:
> >>
> >> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote:
> >>
> >>> Hi,
> >>>
> >>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?
> >>
> >> Years is an overstatement...
> >> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and  unfortunately don't  currently have the time to keep it up-to-date.
> >> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited.
> >> That doesn't mean that you'd be on your own, questions will still be answered...
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email]
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >
> > hi Paul,
> >
> > Skipper and I (statsmodels) relatively recently discussed moving
> > scikits.timeseries to GitHub and maintaining it there since we work on
> > models for time series analysis.
>
> Er…
> https://github.com/pierregm/scikits.timeseries/
> https://github.com/pierregm/scikits.timeseries-sandbox/
>

Great. Is this the "official" advertised repo? I remember there was
some chatter about this a few months back but lost track of the
thread.

> the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray).
>
>
>
> > I work very actively on time
> > series-related functionality in pandas so it might not even be
> > unthinkable to merge together the projects (scikits.timeseries and
> > pandas) and integrate all the numpy.datetime64 stuff once the dust
> > settles there. Just thinking out loud.
>
> That's an idea.
>

Any thoughts on the idea? Do you think it's reasonable and/or
beneficial? There is also some talk with the scikits.learn and
scikits.statsmodels to drop the scikits namespace, which would be
better as a collective decision, so the merging could be a part of
this? I use both packages now, and I, for one, would love to see them
come together and share to the extent this is feasible. Others? I
especially like the plotting stuff since it's great but I've had to
make a few local patches here and there for mpl changes.

Skipper
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Dharhas Pothina
In reply to this post by Wes McKinney
 
> models for time series analysis. I work very actively on time
> series-related functionality in pandas so it might not even be
> unthinkable to merge together the projects (scikits.timeseries and
> pandas) and integrate all the numpy.datetime64 stuff once the dust
> settles there. Just thinking out loud.
There is functionality I like and use in both pandas and scikits.timeseries, moving towards and eventual goal of merging the two is a great idea.
 
- dharhas

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Pierre GM-2
In reply to this post by jseabold

On Jul 26, 2011, at 4:42 PM, Skipper Seabold wrote:

> On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM <[hidden email]> wrote:
>>
>> On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote:
>>
>>> On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote:
>>>>
>>>> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased?
>>>>
>>>> Years is an overstatement...
>>>> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and  unfortunately don't  currently have the time to keep it up-to-date.
>>>> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited.
>>>> That doesn't mean that you'd be on your own, questions will still be answered...
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> [hidden email]
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>
>>> hi Paul,
>>>
>>> Skipper and I (statsmodels) relatively recently discussed moving
>>> scikits.timeseries to GitHub and maintaining it there since we work on
>>> models for time series analysis.
>>
>> Er…
>> https://github.com/pierregm/scikits.timeseries/
>> https://github.com/pierregm/scikits.timeseries-sandbox/
>>
>
> Great. Is this the "official" advertised repo? I remember there was
> some chatter about this a few months back but lost track of the
> thread.


Yep. The scikits.timeseries is just the SVN site ported to git. The sandbox one was dubbed 'experimental' on this very list.


>
>> the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray).
>>
>>
>>
>>> I work very actively on time
>>> series-related functionality in pandas so it might not even be
>>> unthinkable to merge together the projects (scikits.timeseries and
>>> pandas) and integrate all the numpy.datetime64 stuff once the dust
>>> settles there. Just thinking out loud.
>>
>> That's an idea.
>>
>
> Any thoughts on the idea? Do you think it's reasonable and/or
> beneficial? There is also some talk with the scikits.learn and
> scikits.statsmodels to drop the scikits namespace, which would be
> better as a collective decision, so the merging could be a part of
> this? I use both packages now, and I, for one, would love to see them
> come together and share to the extent this is feasible. Others? I
> especially like the plotting stuff since it's great but I've had to
> make a few local patches here and there for mpl changes.


No surprise for matplotlib. I kinda dropped the ball here (when I need to plot stuffs these days, I don't use mpl).
I haven't used pandas yet, for the same reasons why I wasn't able to keep with updating scikits.timeseries. But if y'all use the two in parallel and have a need for porting scikits.timeseries to pandas, then go for it, you have my blessing. And you know where to contact me if you have some issues or questions.


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Matt Knox-4
In reply to this post by jseabold

> >>> I work very actively on time
> >>> series-related functionality in pandas so it might not even be
> >>> unthinkable to merge together the projects (scikits.timeseries and
> >>> pandas) and integrate all the numpy.datetime64 stuff once the dust
> >>> settles there. Just thinking out loud.
> >>
> >> That's an idea.
> >>
> >
> > Any thoughts on the idea? Do you think it's reasonable and/or
> > beneficial? There is also some talk with the scikits.learn and
> > scikits.statsmodels to drop the scikits namespace, which would be
> > better as a collective decision, so the merging could be a part of
> > this? I use both packages now, and I, for one, would love to see them
> > come together and share to the extent this is feasible. Others? I
> > especially like the plotting stuff since it's great but I've had to
> > make a few local patches here and there for mpl changes.
>
> No surprise for matplotlib. I kinda dropped the ball here (when I need to
> plot stuffs these days, I don't use mpl). I haven't used pandas yet, for the
> same reasons why I wasn't able to keep with updating scikits.timeseries.
> But if y'all use the two in parallel and have a need for porting
> scikits.timeseries to pandas, then go for it, you have my blessing. And you
> know where to contact me if you have some issues or questions.

I would basically echo Pierre's comments here. I don't have the time (or to
be perfectly honest, the energy and motivation) to maintain the timeseries
module anymore and would definitely be in favor of any efforts to merge its
functionality into a better supported module.

It's clear at this point that the timeseries module in its current form is a
dead end given the lack of maintainers as well as the fundamental building
blocks which are coming into place that would allow a better timeseries module.
Those building blocks being:
   
    1. datetime data type support in numpy
    2. improved missing value support in numpy
    3. data array / labelled array / pandas type of stuff which should (in
       theory) simplify indexing a timeseries with dates relative to the large
       hacks used in the current timeseries module

In many ways, the timeseries module is a giant hack which tries to work around
the fact that it is missing these key foundational pieces in numpy.

If pandas is the module that unifies all these concepts into a cohesive
package, then I think that is fantastic!

And from lurking on the numpy and scipy mailing lists and monitoring all the
threads on the related topics recently, I feel confident that I have little to
contribute and that the problem rests in much more capable hands than my own :)

- Matt Knox


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Gael Varoquaux
On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote:
> In many ways, the timeseries module is a giant hack which tries to work
> around the fact that it is missing these key foundational pieces in
> numpy.

I don't believe this statement is true. If you are doing statistics, you
think that what is really missing in numpy is missing data support. If
you are doing timeseries analysis, you are missing timeseries support. If
you are doing spatial models, you are missing unstructured spatial data
support with builtin interpolation, if you are doing general relativity,
you are missing contra/co-variant tensor support.

In my opinion, the important thing to keep in mind is that while each
domain-specific application calls for different specific data structures,
they are not universal, and you cannot stick them all in one library. The
good new is that with numpy arrays, you can build data structures and
libraries that talk more or less together, sharing the data accross
domain. However, the more you embedded your specificities in your data
structure, the more it becomes alien to people who don't have the same
usecases. For instance the various VTK data structures are amongst the
most beautiful structures for encoding spatial information. Yet most
people not coming from 3D data processing hate them, because they don't
understand them, and others are very busy reinventing the same set of
abstractions. Similarly, R is great for statistics, but people who don't
do statistics find the syntax incomprehensible and the data structures
too restrictive. Matlab is great for linear alegbra, but if you move in
N-dimensional word it gets clumsy.

My point is: let us stop dreaming that a change to core numpy will solve
our problems. I am not saying that it cannot be improved, but in my
opinion, the reason numpy is so successful is that it is actually the
intersection of many different domain-specific requirements, and not the
union.

2 cents from the peanut gallery,

Gael
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
On Tue, Jul 26, 2011 at 6:28 PM, Gael Varoquaux
<[hidden email]> wrote:

> On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote:
>> In many ways, the timeseries module is a giant hack which tries to work
>> around the fact that it is missing these key foundational pieces in
>> numpy.
>
> I don't believe this statement is true. If you are doing statistics, you
> think that what is really missing in numpy is missing data support. If
> you are doing timeseries analysis, you are missing timeseries support. If
> you are doing spatial models, you are missing unstructured spatial data
> support with builtin interpolation, if you are doing general relativity,
> you are missing contra/co-variant tensor support.
>
> In my opinion, the important thing to keep in mind is that while each
> domain-specific application calls for different specific data structures,
> they are not universal, and you cannot stick them all in one library. The
> good new is that with numpy arrays, you can build data structures and
> libraries that talk more or less together, sharing the data accross
> domain. However, the more you embedded your specificities in your data
> structure, the more it becomes alien to people who don't have the same
> usecases. For instance the various VTK data structures are amongst the
> most beautiful structures for encoding spatial information. Yet most
> people not coming from 3D data processing hate them, because they don't
> understand them, and others are very busy reinventing the same set of
> abstractions. Similarly, R is great for statistics, but people who don't
> do statistics find the syntax incomprehensible and the data structures
> too restrictive. Matlab is great for linear alegbra, but if you move in
> N-dimensional word it gets clumsy.
>
> My point is: let us stop dreaming that a change to core numpy will solve
> our problems. I am not saying that it cannot be improved, but in my
> opinion, the reason numpy is so successful is that it is actually the
> intersection of many different domain-specific requirements, and not the
> union.

+1, I agree completely: NumPy will provide the fundamental building
blocks we can use to build domain-specific data structures-- there
will be no deus ex machina =)

> 2 cents from the peanut gallery,
>
> Gael
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Matt Knox-4
In reply to this post by Gael Varoquaux
Gael Varoquaux <gael.varoquaux <at> normalesup.org> writes:

>
> On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote:
> > In many ways, the timeseries module is a giant hack which tries to work
> > around the fact that it is missing these key foundational pieces in
> > numpy.
>
> I don't believe this statement is true. If you are doing statistics, you
> think that what is really missing in numpy is missing data support. If
> you are doing timeseries analysis, you are missing timeseries support. If
> you are doing spatial models, you are missing unstructured spatial data
> support with builtin interpolation, if you are doing general relativity,
> you are missing contra/co-variant tensor support.

Ok, perhaps my statement was a bit harsh :) . But the point I was trying to
make is that the timeseries module could be dramatically simplified and cleaned
up internally with some of those forthcoming foundational pieces in numpy,
even if the API and functionality of the timeseries module is kept identical
to what it is right now.

> My point is: let us stop dreaming that a change to core numpy will solve
> our problems. I am not saying that it cannot be improved, but in my
> opinion, the reason numpy is so successful is that it is actually the
> intersection of many different domain-specific requirements, and not the
> union.

You are right. There is no such thing as a one size fits all data structure. It
just so happens that Wes' use cases (from my understanding) are basically the
same as mine (finance, etc). So from my own selfish point of view, the idea of
pandas swallowing up the timeseries module and incorporating its functionality
sounds kind of nice since that would give ME (and probably most of the people
that work in the finance domain) an awesome swiss army knife data structure
that solves all the problems that I care about :)

- Matt Knox


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Gael Varoquaux
On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote:
> Ok, perhaps my statement was a bit harsh :) . But the point I was
> trying to make is that the timeseries module could be dramatically
> simplified and cleaned up internally with some of those forthcoming
> foundational pieces in numpy,

Eventhough I do not know the timeseries module, I wouldn't be surprised
that it is indeed the case. It is probably very valuable if you are able
to identify localized enhancements to numpy that make your life easier,
as they might make many other people's life easier.

> It just so happens that Wes' use cases (from my understanding) are
> basically the same as mine (finance, etc). So from my own selfish point
> of view, the idea of pandas swallowing up the timeseries module and
> incorporating its functionality sounds kind of nice since that would
> give ME (and probably most of the people that work in the finance
> domain)

I think that it is really great if the different packages doing time
series analysis unite. It will probably give better packages technically,
and there is a lot of value to the community in such work.

Gael
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
On Wed, Jul 27, 2011 at 10:12 AM, Gael Varoquaux
<[hidden email]> wrote:

> On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote:
>> Ok, perhaps my statement was a bit harsh :) . But the point I was
>> trying to make is that the timeseries module could be dramatically
>> simplified and cleaned up internally with some of those forthcoming
>> foundational pieces in numpy,
>
> Eventhough I do not know the timeseries module, I wouldn't be surprised
> that it is indeed the case. It is probably very valuable if you are able
> to identify localized enhancements to numpy that make your life easier,
> as they might make many other people's life easier.
>
>> It just so happens that Wes' use cases (from my understanding) are
>> basically the same as mine (finance, etc). So from my own selfish point
>> of view, the idea of pandas swallowing up the timeseries module and
>> incorporating its functionality sounds kind of nice since that would
>> give ME (and probably most of the people that work in the finance
>> domain)
>
> I think that it is really great if the different packages doing time
> series analysis unite. It will probably give better packages technically,
> and there is a lot of value to the community in such work.

I agree. I already have 50% or more of the features in
scikits.timeseries, so this gets back to my fragmentation argument
(users being stuck with a confusing choice between multiple
libraries). Let's make it happen!

> Gael
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Matt Knox-4
Wes McKinney <wesmckinn <at> gmail.com> writes:
>
> I agree. I already have 50% or more of the features in
> scikits.timeseries, so this gets back to my fragmentation argument
> (users being stuck with a confusing choice between multiple
> libraries). Let's make it happen!

Ok. In the interest of moving this forward, here is a quick list of things I
see missing in pandas that scikits.timeseries does. For brevity I will skip the
reasons that these features exist, but if the usefulness is not obvious please
ask me to clarify.

Frequency conversion flexibility:
    - when going from a higher frequency to lower frequency (eg. daily to
      monthly), the timeseries module adds an extra dimension and groups the
      points so you still have all the original data rather than discarding
      data
    - allow you to specify where to place the value - the start or end of the
      period - when converting from lower frequency to higher frequency (eg.
      monthly to daily)
    - support of a larger number of frequencies

Indexing:
    - slicing with dates (looks like "truncate" method does this, but would
      be nice to be able to just use slicing directly)

- simple arithmetic on dates ("date + 1" means "add one unit at the current
  frequency")
- various date/series attributes such as year, qyear, quarter, month, week,
  day, day_of_year, etc...
  (ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information)
- full missing value support (TimeSeries class is a subclass of MaskedArray)    
- moving (rolling) median/min/max

- Matt Knox

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Andreas Hilboll
While we're at it:

> Frequency conversion flexibility:
>     - when going from a higher frequency to lower frequency (eg. daily to
>       monthly), the timeseries module adds an extra dimension and groups the
>       points so you still have all the original data rather than discarding
>       data

I'm using scikits.timeseries for analysis of atmospheric measurements.
I've always wanted several things, and now that discussion is under way,
maybe it's a good time to point them out:

* When plotting a series, have the flexibility to have the value marked
down at the center of the frequency. What I mean is, when I have monthly
data and make a plot of one year, have each value be printed at the
middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not
obvious to the reader whether the value printed on July 1 is actually
that for June or that for July.

* Have full support for n-dimensional series. When I have a n-d array of
data values for each point in time (n>0), many things don't work. The
biggest problem here seems to be that pickling actually *seems* to work
(a file is created), but when I load the file again, the entries in the
array are somehow screwed up (like transposed).

* Enable rolling means for sparse data. For example, if I have irregular
(in time) measurements, say, every one to six days, I would still like
to be able to calculate a rolling n-day-average. Missing values should
be ignored (speaking numpy: timeslice.compressed().mean())

I don't know if any of this is already implemented in pandas, as I've
never used it up till now. But perhaps someone would be interested in
implementing these issues ...

Cheers,
Andreas.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
In reply to this post by Matt Knox-4
On Wed, Jul 27, 2011 at 12:12 PM, Matt Knox <[hidden email]> wrote:

> Wes McKinney <wesmckinn <at> gmail.com> writes:
>>
>> I agree. I already have 50% or more of the features in
>> scikits.timeseries, so this gets back to my fragmentation argument
>> (users being stuck with a confusing choice between multiple
>> libraries). Let's make it happen!
>
> Ok. In the interest of moving this forward, here is a quick list of things I
> see missing in pandas that scikits.timeseries does. For brevity I will skip the
> reasons that these features exist, but if the usefulness is not obvious please
> ask me to clarify.
>
> Frequency conversion flexibility:
>    - when going from a higher frequency to lower frequency (eg. daily to
>      monthly), the timeseries module adds an extra dimension and groups the
>      points so you still have all the original data rather than discarding
>      data

This is basically just a group by (reduceat) operation. I've been
working a lot on groupby lately and resampling (frequency conversion
has always existed, and lo-to-high is simple, but not easy
downsampling/aggregation) will fall out as an afterthought. Should not
require any C code either.

>    - allow you to specify where to place the value - the start or end of the
>      period - when converting from lower frequency to higher frequency (eg.
>      monthly to daily)

I'll make sure to make this available as an option. down going
low-to-high you have two interpolation options: forward fill (aka
"pad") and back fill, which I think is what you're saying?

>    - support of a larger number of frequencies

Which ones are you thinking of? Currently I have:

- hourly, minutely, secondly (and things like 5-minutely can be done,
e.g. Minute(5))
- daily / business daily
- weekly (anchored on a particular weekday)
- monthly / business month-end
- (business) quarterly, anchored on jan/feb/march
- annual / business annual (start and end)

there is also a generic delta wrapping dateutil.relativedelta, so it's
possible to go beyond these. the scikits.timeseries code is far more
comprehensive and complete, completely agree, so if numpy.datetime64
isn't good enough it will hopefully be straightforward to augment.
hopefully numpy.datetime64 will reduce the need for a lot of
pandas.core.datetools-- although there are still merits (in my view)
to having tools for working with Python datetime.datetime objects.

> Indexing:
>    - slicing with dates (looks like "truncate" method does this, but would
>      be nice to be able to just use slicing directly)

you can use fancy indexing to do this now, e.g:

ts.ix[d1:d2]

I could push this down into __getitem__ and __setitem__ too without much work

> - simple arithmetic on dates ("date + 1" means "add one unit at the current
>  frequency")

numpy.datetime64 will do this, which is very nice. the pandas date
offsets work on Python datetimes. so I can do stuff like:

In [35]: datetime.today() + 5 * datetools.bday
Out[35]: datetime.datetime(2011, 8, 3, 0, 0)

and if you have a whole DateRange (semi-equiv of DateArray) you can
easily shift by the current frequency:

In [38]: dr
Out[38]:
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2000-01-03 00:00:00, ..., 2004-12-31 00:00:00]
length: 1305

In [39]: dr.shift(10)
Out[39]:
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2000-01-17 00:00:00, ..., 2005-01-14 00:00:00]
length: 1305

> - various date/series attributes such as year, qyear, quarter, month, week,
>  day, day_of_year, etc...
>  (ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information)

I agree this would be nice and very straightforward to add

> - full missing value support (TimeSeries class is a subclass of MaskedArray)

I challenge you to find a (realistic) use case where the missing value
support in pandas in inadequate. I'm being completely serious =) But
I've been very vocal about my dislike of MaskedArrays in the missing
data discussions. They're hard for (normal) people to use, degrade
performance, use extra memory, etc. They add a layer of complication
for working with time series that strikes me as completely
unnecessary.

> - moving (rolling) median/min/max

In [41]: pandas.rolling_
pandas.rolling_apply     pandas.rolling_median
pandas.rolling_corr      pandas.rolling_min
pandas.rolling_count     pandas.rolling_quantile
pandas.rolling_cov       pandas.rolling_skew
pandas.rolling_kurt      pandas.rolling_std
pandas.rolling_max       pandas.rolling_sum
pandas.rolling_mean      pandas.rolling_var

there's also bottleneck, although it doesn't provide the min_periods
argument that I need (though I should look at the perf hit of using
bottleneck.move_nan* functions and nulling out results not having
enough observations after the fact...)

> - Matt Knox
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

this is good feedback =) i think we're on the right track

- Wes
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
In reply to this post by Andreas Hilboll
On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote:

> While we're at it:
>
>> Frequency conversion flexibility:
>>     - when going from a higher frequency to lower frequency (eg. daily to
>>       monthly), the timeseries module adds an extra dimension and groups the
>>       points so you still have all the original data rather than discarding
>>       data
>
> I'm using scikits.timeseries for analysis of atmospheric measurements.
> I've always wanted several things, and now that discussion is under way,
> maybe it's a good time to point them out:
>
> * When plotting a series, have the flexibility to have the value marked
> down at the center of the frequency. What I mean is, when I have monthly
> data and make a plot of one year, have each value be printed at the
> middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not
> obvious to the reader whether the value printed on July 1 is actually
> that for June or that for July.

Seems like this could be pretty easy to do, need only add an
"tick_offset" option to the plotting function, I think.

> * Have full support for n-dimensional series. When I have a n-d array of
> data values for each point in time (n>0), many things don't work. The
> biggest problem here seems to be that pickling actually *seems* to work
> (a file is created), but when I load the file again, the entries in the
> array are somehow screwed up (like transposed).

support in pandas is very good for working with multiple univariate
time series using DataFrame, not quite as good for panel data (3d),
but I'm planing to build out an n-dimensional NDFrame which could
potentially address your needs. If you can show me your data and tell
me what you need to be able to do with it, it would be helpful to me.
The majority of my work in pandas has been motivated by use cases I've
experienced in applications.

> * Enable rolling means for sparse data. For example, if I have irregular
> (in time) measurements, say, every one to six days, I would still like
> to be able to calculate a rolling n-day-average. Missing values should
> be ignored (speaking numpy: timeslice.compressed().mean())

Either pandas or bottleneck will do this for you, so you can say something like:

rolling_mean(ts, window=50, min_periods=5)

and any sample with at least 5 data points in the window will compute
a value, missing (NaN) data will be excluded. Bottleneck has move_mean
and move_nanmean which will outperform pandas.rolling_mean a little
bit since the Cython code is more specialized.

> I don't know if any of this is already implemented in pandas, as I've
> never used it up till now. But perhaps someone would be interested in
> implementing these issues ...
>
> Cheers,
> Andreas.
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Keith Goodman
On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote:
> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote:

>> * Enable rolling means for sparse data. For example, if I have irregular
>> (in time) measurements, say, every one to six days, I would still like
>> to be able to calculate a rolling n-day-average. Missing values should
>> be ignored (speaking numpy: timeslice.compressed().mean())
>
> Either pandas or bottleneck will do this for you, so you can say something like:
>
> rolling_mean(ts, window=50, min_periods=5)
>
> and any sample with at least 5 data points in the window will compute
> a value, missing (NaN) data will be excluded. Bottleneck has move_mean
> and move_nanmean which will outperform pandas.rolling_mean a little
> bit since the Cython code is more specialized.

Another use case is when your data is irregularly spaced in time but
you still want a moving min/mean/median/whatever over a fixed time
window instead of a fixed number of data points. That might be
Andreas's use case.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Wes McKinney
On Wed, Jul 27, 2011 at 1:27 PM, Keith Goodman <[hidden email]> wrote:

> On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote:
>> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote:
>
>>> * Enable rolling means for sparse data. For example, if I have irregular
>>> (in time) measurements, say, every one to six days, I would still like
>>> to be able to calculate a rolling n-day-average. Missing values should
>>> be ignored (speaking numpy: timeslice.compressed().mean())
>>
>> Either pandas or bottleneck will do this for you, so you can say something like:
>>
>> rolling_mean(ts, window=50, min_periods=5)
>>
>> and any sample with at least 5 data points in the window will compute
>> a value, missing (NaN) data will be excluded. Bottleneck has move_mean
>> and move_nanmean which will outperform pandas.rolling_mean a little
>> bit since the Cython code is more specialized.
>
> Another use case is when your data is irregularly spaced in time but
> you still want a moving min/mean/median/whatever over a fixed time
> window instead of a fixed number of data points. That might be
> Andreas's use case.
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

True. In pandas parlance I think what you would do is:

rolling_mean(ts.valid(), window).reindex(ts.index, method='ffill')
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Status of TimeSeries SciKit

Andreas-119
In reply to this post by Keith Goodman


On 2011-07-27 19:27, Keith Goodman wrote:

> On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote:
>> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote:
>
>>> * Enable rolling means for sparse data. For example, if I have irregular
>>> (in time) measurements, say, every one to six days, I would still like
>>> to be able to calculate a rolling n-day-average. Missing values should
>>> be ignored (speaking numpy: timeslice.compressed().mean())
>>
>> Either pandas or bottleneck will do this for you, so you can say something like:
>>
>> rolling_mean(ts, window=50, min_periods=5)
>>
>> and any sample with at least 5 data points in the window will compute
>> a value, missing (NaN) data will be excluded. Bottleneck has move_mean
>> and move_nanmean which will outperform pandas.rolling_mean a little
>> bit since the Cython code is more specialized.
>
> Another use case is when your data is irregularly spaced in time but
> you still want a moving min/mean/median/whatever over a fixed time
> window instead of a fixed number of data points. That might be
> Andreas's use case.

Yes, this is exactly what I'm looking for.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
12