Hi,
I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? Best wishes, Paul _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > Hi, > > I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? Years is an overstatement... The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. That doesn't mean that you'd be on your own, questions will still be answered... _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote:
> > On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > >> Hi, >> >> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? > > Years is an overstatement... > The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. > *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. > That doesn't mean that you'd be on your own, questions will still be answered... > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > hi Paul, Skipper and I (statsmodels) relatively recently discussed moving scikits.timeseries to GitHub and maintaining it there since we work on models for time series analysis. I work very actively on time series-related functionality in pandas so it might not even be unthinkable to merge together the projects (scikits.timeseries and pandas) and integrate all the numpy.datetime64 stuff once the dust settles there. Just thinking out loud. - Wes _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: > On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote: >> >> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: >> >>> Hi, >>> >>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? >> >> Years is an overstatement... >> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. >> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. >> That doesn't mean that you'd be on your own, questions will still be answered... >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > hi Paul, > > Skipper and I (statsmodels) relatively recently discussed moving > scikits.timeseries to GitHub and maintaining it there since we work on > models for time series analysis. Er… https://github.com/pierregm/scikits.timeseries/ https://github.com/pierregm/scikits.timeseries-sandbox/ the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). > I work very actively on time > series-related functionality in pandas so it might not even be > unthinkable to merge together the projects (scikits.timeseries and > pandas) and integrate all the numpy.datetime64 stuff once the dust > settles there. Just thinking out loud. That's an idea. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM <[hidden email]> wrote:
> > On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: > > > On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote: > >> > >> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: > >> > >>> Hi, > >>> > >>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? > >> > >> Years is an overstatement... > >> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. > >> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. > >> That doesn't mean that you'd be on your own, questions will still be answered... > >> _______________________________________________ > >> SciPy-User mailing list > >> [hidden email] > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > > hi Paul, > > > > Skipper and I (statsmodels) relatively recently discussed moving > > scikits.timeseries to GitHub and maintaining it there since we work on > > models for time series analysis. > > Er… > https://github.com/pierregm/scikits.timeseries/ > https://github.com/pierregm/scikits.timeseries-sandbox/ > Great. Is this the "official" advertised repo? I remember there was some chatter about this a few months back but lost track of the thread. > the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). > > > > > I work very actively on time > > series-related functionality in pandas so it might not even be > > unthinkable to merge together the projects (scikits.timeseries and > > pandas) and integrate all the numpy.datetime64 stuff once the dust > > settles there. Just thinking out loud. > > That's an idea. > Any thoughts on the idea? Do you think it's reasonable and/or beneficial? There is also some talk with the scikits.learn and scikits.statsmodels to drop the scikits namespace, which would be better as a collective decision, so the merging could be a part of this? I use both packages now, and I, for one, would love to see them come together and share to the extent this is feasible. Others? I especially like the plotting stuff since it's great but I've had to make a few local patches here and there for mpl changes. Skipper _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Wes McKinney
> models for time series analysis. I work very actively on time
> series-related functionality in pandas so it might not even be > unthinkable to merge together the projects (scikits.timeseries and > pandas) and integrate all the numpy.datetime64 stuff once the dust > settles there. Just thinking out loud. There is functionality I like and use in both pandas and scikits.timeseries, moving towards and eventual goal of merging the two is a great idea.
- dharhas
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by jseabold
On Jul 26, 2011, at 4:42 PM, Skipper Seabold wrote: > On Tue, Jul 26, 2011 at 10:35 AM, Pierre GM <[hidden email]> wrote: >> >> On Jul 26, 2011, at 4:25 PM, Wes McKinney wrote: >> >>> On Tue, Jul 26, 2011 at 9:30 AM, Pierre GM <[hidden email]> wrote: >>>> >>>> On Jul 26, 2011, at 2:30 PM, Paul Bilokon wrote: >>>> >>>>> Hi, >>>>> >>>>> I would like to find out about the status of the TimeSeries SciKit. It looks like it hasn't been updated for some years. Has the development ceased? >>>> >>>> Years is an overstatement... >>>> The scikits hasn't been updated in a while, yes. The two developpers got really busy on other projects (like, jobs to pay bills) and unfortunately don't currently have the time to keep it up-to-date. >>>> *If* I could find a job that would leave me a bit of time to work on it, I'd try to support the new date time type. But until then, further developments are in limbo and support limited. >>>> That doesn't mean that you'd be on your own, questions will still be answered... >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> [hidden email] >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> >>> hi Paul, >>> >>> Skipper and I (statsmodels) relatively recently discussed moving >>> scikits.timeseries to GitHub and maintaining it there since we work on >>> models for time series analysis. >> >> Er… >> https://github.com/pierregm/scikits.timeseries/ >> https://github.com/pierregm/scikits.timeseries-sandbox/ >> > > Great. Is this the "official" advertised repo? I remember there was > some chatter about this a few months back but lost track of the > thread. Yep. The scikits.timeseries is just the SVN site ported to git. The sandbox one was dubbed 'experimental' on this very list. > >> the second one is actually a branch of the first one (I know, it's silly with git, but I was only learning at the time), that provides some new functionalities like a 'time step' in addition to the 'time unit' (so that you can define regular series w/ one entry every 5min, say), but is not completely baked on the C side (I had some issues subclassing the C ndarray). >> >> >> >>> I work very actively on time >>> series-related functionality in pandas so it might not even be >>> unthinkable to merge together the projects (scikits.timeseries and >>> pandas) and integrate all the numpy.datetime64 stuff once the dust >>> settles there. Just thinking out loud. >> >> That's an idea. >> > > Any thoughts on the idea? Do you think it's reasonable and/or > beneficial? There is also some talk with the scikits.learn and > scikits.statsmodels to drop the scikits namespace, which would be > better as a collective decision, so the merging could be a part of > this? I use both packages now, and I, for one, would love to see them > come together and share to the extent this is feasible. Others? I > especially like the plotting stuff since it's great but I've had to > make a few local patches here and there for mpl changes. No surprise for matplotlib. I kinda dropped the ball here (when I need to plot stuffs these days, I don't use mpl). I haven't used pandas yet, for the same reasons why I wasn't able to keep with updating scikits.timeseries. But if y'all use the two in parallel and have a need for porting scikits.timeseries to pandas, then go for it, you have my blessing. And you know where to contact me if you have some issues or questions. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by jseabold
> >>> I work very actively on time > >>> series-related functionality in pandas so it might not even be > >>> unthinkable to merge together the projects (scikits.timeseries and > >>> pandas) and integrate all the numpy.datetime64 stuff once the dust > >>> settles there. Just thinking out loud. > >> > >> That's an idea. > >> > > > > Any thoughts on the idea? Do you think it's reasonable and/or > > beneficial? There is also some talk with the scikits.learn and > > scikits.statsmodels to drop the scikits namespace, which would be > > better as a collective decision, so the merging could be a part of > > this? I use both packages now, and I, for one, would love to see them > > come together and share to the extent this is feasible. Others? I > > especially like the plotting stuff since it's great but I've had to > > make a few local patches here and there for mpl changes. > > No surprise for matplotlib. I kinda dropped the ball here (when I need to > plot stuffs these days, I don't use mpl). I haven't used pandas yet, for the > same reasons why I wasn't able to keep with updating scikits.timeseries. > But if y'all use the two in parallel and have a need for porting > scikits.timeseries to pandas, then go for it, you have my blessing. And you > know where to contact me if you have some issues or questions. I would basically echo Pierre's comments here. I don't have the time (or to be perfectly honest, the energy and motivation) to maintain the timeseries module anymore and would definitely be in favor of any efforts to merge its functionality into a better supported module. It's clear at this point that the timeseries module in its current form is a dead end given the lack of maintainers as well as the fundamental building blocks which are coming into place that would allow a better timeseries module. Those building blocks being: 1. datetime data type support in numpy 2. improved missing value support in numpy 3. data array / labelled array / pandas type of stuff which should (in theory) simplify indexing a timeseries with dates relative to the large hacks used in the current timeseries module In many ways, the timeseries module is a giant hack which tries to work around the fact that it is missing these key foundational pieces in numpy. If pandas is the module that unifies all these concepts into a cohesive package, then I think that is fantastic! And from lurking on the numpy and scipy mailing lists and monitoring all the threads on the related topics recently, I feel confident that I have little to contribute and that the problem rests in much more capable hands than my own :) - Matt Knox _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote:
> In many ways, the timeseries module is a giant hack which tries to work > around the fact that it is missing these key foundational pieces in > numpy. I don't believe this statement is true. If you are doing statistics, you think that what is really missing in numpy is missing data support. If you are doing timeseries analysis, you are missing timeseries support. If you are doing spatial models, you are missing unstructured spatial data support with builtin interpolation, if you are doing general relativity, you are missing contra/co-variant tensor support. In my opinion, the important thing to keep in mind is that while each domain-specific application calls for different specific data structures, they are not universal, and you cannot stick them all in one library. The good new is that with numpy arrays, you can build data structures and libraries that talk more or less together, sharing the data accross domain. However, the more you embedded your specificities in your data structure, the more it becomes alien to people who don't have the same usecases. For instance the various VTK data structures are amongst the most beautiful structures for encoding spatial information. Yet most people not coming from 3D data processing hate them, because they don't understand them, and others are very busy reinventing the same set of abstractions. Similarly, R is great for statistics, but people who don't do statistics find the syntax incomprehensible and the data structures too restrictive. Matlab is great for linear alegbra, but if you move in N-dimensional word it gets clumsy. My point is: let us stop dreaming that a change to core numpy will solve our problems. I am not saying that it cannot be improved, but in my opinion, the reason numpy is so successful is that it is actually the intersection of many different domain-specific requirements, and not the union. 2 cents from the peanut gallery, Gael _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Jul 26, 2011 at 6:28 PM, Gael Varoquaux
<[hidden email]> wrote: > On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote: >> In many ways, the timeseries module is a giant hack which tries to work >> around the fact that it is missing these key foundational pieces in >> numpy. > > I don't believe this statement is true. If you are doing statistics, you > think that what is really missing in numpy is missing data support. If > you are doing timeseries analysis, you are missing timeseries support. If > you are doing spatial models, you are missing unstructured spatial data > support with builtin interpolation, if you are doing general relativity, > you are missing contra/co-variant tensor support. > > In my opinion, the important thing to keep in mind is that while each > domain-specific application calls for different specific data structures, > they are not universal, and you cannot stick them all in one library. The > good new is that with numpy arrays, you can build data structures and > libraries that talk more or less together, sharing the data accross > domain. However, the more you embedded your specificities in your data > structure, the more it becomes alien to people who don't have the same > usecases. For instance the various VTK data structures are amongst the > most beautiful structures for encoding spatial information. Yet most > people not coming from 3D data processing hate them, because they don't > understand them, and others are very busy reinventing the same set of > abstractions. Similarly, R is great for statistics, but people who don't > do statistics find the syntax incomprehensible and the data structures > too restrictive. Matlab is great for linear alegbra, but if you move in > N-dimensional word it gets clumsy. > > My point is: let us stop dreaming that a change to core numpy will solve > our problems. I am not saying that it cannot be improved, but in my > opinion, the reason numpy is so successful is that it is actually the > intersection of many different domain-specific requirements, and not the > union. +1, I agree completely: NumPy will provide the fundamental building blocks we can use to build domain-specific data structures-- there will be no deus ex machina =) > 2 cents from the peanut gallery, > > Gael > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Gael Varoquaux
Gael Varoquaux <gael.varoquaux <at> normalesup.org> writes:
> > On Tue, Jul 26, 2011 at 05:58:27PM +0000, Matt Knox wrote: > > In many ways, the timeseries module is a giant hack which tries to work > > around the fact that it is missing these key foundational pieces in > > numpy. > > I don't believe this statement is true. If you are doing statistics, you > think that what is really missing in numpy is missing data support. If > you are doing timeseries analysis, you are missing timeseries support. If > you are doing spatial models, you are missing unstructured spatial data > support with builtin interpolation, if you are doing general relativity, > you are missing contra/co-variant tensor support. Ok, perhaps my statement was a bit harsh :) . But the point I was trying to make is that the timeseries module could be dramatically simplified and cleaned up internally with some of those forthcoming foundational pieces in numpy, even if the API and functionality of the timeseries module is kept identical to what it is right now. > My point is: let us stop dreaming that a change to core numpy will solve > our problems. I am not saying that it cannot be improved, but in my > opinion, the reason numpy is so successful is that it is actually the > intersection of many different domain-specific requirements, and not the > union. You are right. There is no such thing as a one size fits all data structure. It just so happens that Wes' use cases (from my understanding) are basically the same as mine (finance, etc). So from my own selfish point of view, the idea of pandas swallowing up the timeseries module and incorporating its functionality sounds kind of nice since that would give ME (and probably most of the people that work in the finance domain) an awesome swiss army knife data structure that solves all the problems that I care about :) - Matt Knox _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote:
> Ok, perhaps my statement was a bit harsh :) . But the point I was > trying to make is that the timeseries module could be dramatically > simplified and cleaned up internally with some of those forthcoming > foundational pieces in numpy, Eventhough I do not know the timeseries module, I wouldn't be surprised that it is indeed the case. It is probably very valuable if you are able to identify localized enhancements to numpy that make your life easier, as they might make many other people's life easier. > It just so happens that Wes' use cases (from my understanding) are > basically the same as mine (finance, etc). So from my own selfish point > of view, the idea of pandas swallowing up the timeseries module and > incorporating its functionality sounds kind of nice since that would > give ME (and probably most of the people that work in the finance > domain) I think that it is really great if the different packages doing time series analysis unite. It will probably give better packages technically, and there is a lot of value to the community in such work. Gael _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Wed, Jul 27, 2011 at 10:12 AM, Gael Varoquaux
<[hidden email]> wrote: > On Wed, Jul 27, 2011 at 01:06:21PM +0000, Matt Knox wrote: >> Ok, perhaps my statement was a bit harsh :) . But the point I was >> trying to make is that the timeseries module could be dramatically >> simplified and cleaned up internally with some of those forthcoming >> foundational pieces in numpy, > > Eventhough I do not know the timeseries module, I wouldn't be surprised > that it is indeed the case. It is probably very valuable if you are able > to identify localized enhancements to numpy that make your life easier, > as they might make many other people's life easier. > >> It just so happens that Wes' use cases (from my understanding) are >> basically the same as mine (finance, etc). So from my own selfish point >> of view, the idea of pandas swallowing up the timeseries module and >> incorporating its functionality sounds kind of nice since that would >> give ME (and probably most of the people that work in the finance >> domain) > > I think that it is really great if the different packages doing time > series analysis unite. It will probably give better packages technically, > and there is a lot of value to the community in such work. I agree. I already have 50% or more of the features in scikits.timeseries, so this gets back to my fragmentation argument (users being stuck with a confusing choice between multiple libraries). Let's make it happen! > Gael > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Wes McKinney <wesmckinn <at> gmail.com> writes:
> > I agree. I already have 50% or more of the features in > scikits.timeseries, so this gets back to my fragmentation argument > (users being stuck with a confusing choice between multiple > libraries). Let's make it happen! Ok. In the interest of moving this forward, here is a quick list of things I see missing in pandas that scikits.timeseries does. For brevity I will skip the reasons that these features exist, but if the usefulness is not obvious please ask me to clarify. Frequency conversion flexibility: - when going from a higher frequency to lower frequency (eg. daily to monthly), the timeseries module adds an extra dimension and groups the points so you still have all the original data rather than discarding data - allow you to specify where to place the value - the start or end of the period - when converting from lower frequency to higher frequency (eg. monthly to daily) - support of a larger number of frequencies Indexing: - slicing with dates (looks like "truncate" method does this, but would be nice to be able to just use slicing directly) - simple arithmetic on dates ("date + 1" means "add one unit at the current frequency") - various date/series attributes such as year, qyear, quarter, month, week, day, day_of_year, etc... (ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information) - full missing value support (TimeSeries class is a subclass of MaskedArray) - moving (rolling) median/min/max - Matt Knox _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
While we're at it:
> Frequency conversion flexibility: > - when going from a higher frequency to lower frequency (eg. daily to > monthly), the timeseries module adds an extra dimension and groups the > points so you still have all the original data rather than discarding > data I'm using scikits.timeseries for analysis of atmospheric measurements. I've always wanted several things, and now that discussion is under way, maybe it's a good time to point them out: * When plotting a series, have the flexibility to have the value marked down at the center of the frequency. What I mean is, when I have monthly data and make a plot of one year, have each value be printed at the middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not obvious to the reader whether the value printed on July 1 is actually that for June or that for July. * Have full support for n-dimensional series. When I have a n-d array of data values for each point in time (n>0), many things don't work. The biggest problem here seems to be that pickling actually *seems* to work (a file is created), but when I load the file again, the entries in the array are somehow screwed up (like transposed). * Enable rolling means for sparse data. For example, if I have irregular (in time) measurements, say, every one to six days, I would still like to be able to calculate a rolling n-day-average. Missing values should be ignored (speaking numpy: timeslice.compressed().mean()) I don't know if any of this is already implemented in pandas, as I've never used it up till now. But perhaps someone would be interested in implementing these issues ... Cheers, Andreas. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Matt Knox-4
On Wed, Jul 27, 2011 at 12:12 PM, Matt Knox <[hidden email]> wrote:
> Wes McKinney <wesmckinn <at> gmail.com> writes: >> >> I agree. I already have 50% or more of the features in >> scikits.timeseries, so this gets back to my fragmentation argument >> (users being stuck with a confusing choice between multiple >> libraries). Let's make it happen! > > Ok. In the interest of moving this forward, here is a quick list of things I > see missing in pandas that scikits.timeseries does. For brevity I will skip the > reasons that these features exist, but if the usefulness is not obvious please > ask me to clarify. > > Frequency conversion flexibility: > - when going from a higher frequency to lower frequency (eg. daily to > monthly), the timeseries module adds an extra dimension and groups the > points so you still have all the original data rather than discarding > data This is basically just a group by (reduceat) operation. I've been working a lot on groupby lately and resampling (frequency conversion has always existed, and lo-to-high is simple, but not easy downsampling/aggregation) will fall out as an afterthought. Should not require any C code either. > - allow you to specify where to place the value - the start or end of the > period - when converting from lower frequency to higher frequency (eg. > monthly to daily) I'll make sure to make this available as an option. down going low-to-high you have two interpolation options: forward fill (aka "pad") and back fill, which I think is what you're saying? > - support of a larger number of frequencies Which ones are you thinking of? Currently I have: - hourly, minutely, secondly (and things like 5-minutely can be done, e.g. Minute(5)) - daily / business daily - weekly (anchored on a particular weekday) - monthly / business month-end - (business) quarterly, anchored on jan/feb/march - annual / business annual (start and end) there is also a generic delta wrapping dateutil.relativedelta, so it's possible to go beyond these. the scikits.timeseries code is far more comprehensive and complete, completely agree, so if numpy.datetime64 isn't good enough it will hopefully be straightforward to augment. hopefully numpy.datetime64 will reduce the need for a lot of pandas.core.datetools-- although there are still merits (in my view) to having tools for working with Python datetime.datetime objects. > Indexing: > - slicing with dates (looks like "truncate" method does this, but would > be nice to be able to just use slicing directly) you can use fancy indexing to do this now, e.g: ts.ix[d1:d2] I could push this down into __getitem__ and __setitem__ too without much work > - simple arithmetic on dates ("date + 1" means "add one unit at the current > frequency") numpy.datetime64 will do this, which is very nice. the pandas date offsets work on Python datetimes. so I can do stuff like: In [35]: datetime.today() + 5 * datetools.bday Out[35]: datetime.datetime(2011, 8, 3, 0, 0) and if you have a whole DateRange (semi-equiv of DateArray) you can easily shift by the current frequency: In [38]: dr Out[38]: <class 'pandas.core.daterange.DateRange'> offset: <1 BusinessDay>, tzinfo: None [2000-01-03 00:00:00, ..., 2004-12-31 00:00:00] length: 1305 In [39]: dr.shift(10) Out[39]: <class 'pandas.core.daterange.DateRange'> offset: <1 BusinessDay>, tzinfo: None [2000-01-17 00:00:00, ..., 2005-01-14 00:00:00] length: 1305 > - various date/series attributes such as year, qyear, quarter, month, week, > day, day_of_year, etc... > (ref: http://pytseries.sourceforge.net/core.datearrays.html#date-information) I agree this would be nice and very straightforward to add > - full missing value support (TimeSeries class is a subclass of MaskedArray) I challenge you to find a (realistic) use case where the missing value support in pandas in inadequate. I'm being completely serious =) But I've been very vocal about my dislike of MaskedArrays in the missing data discussions. They're hard for (normal) people to use, degrade performance, use extra memory, etc. They add a layer of complication for working with time series that strikes me as completely unnecessary. > - moving (rolling) median/min/max In [41]: pandas.rolling_ pandas.rolling_apply pandas.rolling_median pandas.rolling_corr pandas.rolling_min pandas.rolling_count pandas.rolling_quantile pandas.rolling_cov pandas.rolling_skew pandas.rolling_kurt pandas.rolling_std pandas.rolling_max pandas.rolling_sum pandas.rolling_mean pandas.rolling_var there's also bottleneck, although it doesn't provide the min_periods argument that I need (though I should look at the perf hit of using bottleneck.move_nan* functions and nulling out results not having enough observations after the fact...) > - Matt Knox > > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > this is good feedback =) i think we're on the right track - Wes _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Andreas Hilboll
On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote:
> While we're at it: > >> Frequency conversion flexibility: >> - when going from a higher frequency to lower frequency (eg. daily to >> monthly), the timeseries module adds an extra dimension and groups the >> points so you still have all the original data rather than discarding >> data > > I'm using scikits.timeseries for analysis of atmospheric measurements. > I've always wanted several things, and now that discussion is under way, > maybe it's a good time to point them out: > > * When plotting a series, have the flexibility to have the value marked > down at the center of the frequency. What I mean is, when I have monthly > data and make a plot of one year, have each value be printed at the > middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not > obvious to the reader whether the value printed on July 1 is actually > that for June or that for July. Seems like this could be pretty easy to do, need only add an "tick_offset" option to the plotting function, I think. > * Have full support for n-dimensional series. When I have a n-d array of > data values for each point in time (n>0), many things don't work. The > biggest problem here seems to be that pickling actually *seems* to work > (a file is created), but when I load the file again, the entries in the > array are somehow screwed up (like transposed). support in pandas is very good for working with multiple univariate time series using DataFrame, not quite as good for panel data (3d), but I'm planing to build out an n-dimensional NDFrame which could potentially address your needs. If you can show me your data and tell me what you need to be able to do with it, it would be helpful to me. The majority of my work in pandas has been motivated by use cases I've experienced in applications. > * Enable rolling means for sparse data. For example, if I have irregular > (in time) measurements, say, every one to six days, I would still like > to be able to calculate a rolling n-day-average. Missing values should > be ignored (speaking numpy: timeslice.compressed().mean()) Either pandas or bottleneck will do this for you, so you can say something like: rolling_mean(ts, window=50, min_periods=5) and any sample with at least 5 data points in the window will compute a value, missing (NaN) data will be excluded. Bottleneck has move_mean and move_nanmean which will outperform pandas.rolling_mean a little bit since the Cython code is more specialized. > I don't know if any of this is already implemented in pandas, as I've > never used it up till now. But perhaps someone would be interested in > implementing these issues ... > > Cheers, > Andreas. > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote:
> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote: >> * Enable rolling means for sparse data. For example, if I have irregular >> (in time) measurements, say, every one to six days, I would still like >> to be able to calculate a rolling n-day-average. Missing values should >> be ignored (speaking numpy: timeslice.compressed().mean()) > > Either pandas or bottleneck will do this for you, so you can say something like: > > rolling_mean(ts, window=50, min_periods=5) > > and any sample with at least 5 data points in the window will compute > a value, missing (NaN) data will be excluded. Bottleneck has move_mean > and move_nanmean which will outperform pandas.rolling_mean a little > bit since the Cython code is more specialized. Another use case is when your data is irregularly spaced in time but you still want a moving min/mean/median/whatever over a fixed time window instead of a fixed number of data points. That might be Andreas's use case. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Wed, Jul 27, 2011 at 1:27 PM, Keith Goodman <[hidden email]> wrote:
> On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote: >> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote: > >>> * Enable rolling means for sparse data. For example, if I have irregular >>> (in time) measurements, say, every one to six days, I would still like >>> to be able to calculate a rolling n-day-average. Missing values should >>> be ignored (speaking numpy: timeslice.compressed().mean()) >> >> Either pandas or bottleneck will do this for you, so you can say something like: >> >> rolling_mean(ts, window=50, min_periods=5) >> >> and any sample with at least 5 data points in the window will compute >> a value, missing (NaN) data will be excluded. Bottleneck has move_mean >> and move_nanmean which will outperform pandas.rolling_mean a little >> bit since the Cython code is more specialized. > > Another use case is when your data is irregularly spaced in time but > you still want a moving min/mean/median/whatever over a fixed time > window instead of a fixed number of data points. That might be > Andreas's use case. > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > True. In pandas parlance I think what you would do is: rolling_mean(ts.valid(), window).reindex(ts.index, method='ffill') _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Keith Goodman
On 2011-07-27 19:27, Keith Goodman wrote: > On Wed, Jul 27, 2011 at 10:16 AM, Wes McKinney <[hidden email]> wrote: >> On Wed, Jul 27, 2011 at 12:28 PM, Andreas <[hidden email]> wrote: > >>> * Enable rolling means for sparse data. For example, if I have irregular >>> (in time) measurements, say, every one to six days, I would still like >>> to be able to calculate a rolling n-day-average. Missing values should >>> be ignored (speaking numpy: timeslice.compressed().mean()) >> >> Either pandas or bottleneck will do this for you, so you can say something like: >> >> rolling_mean(ts, window=50, min_periods=5) >> >> and any sample with at least 5 data points in the window will compute >> a value, missing (NaN) data will be excluded. Bottleneck has move_mean >> and move_nanmean which will outperform pandas.rolling_mean a little >> bit since the Cython code is more specialized. > > Another use case is when your data is irregularly spaced in time but > you still want a moving min/mean/median/whatever over a fixed time > window instead of a fixed number of data points. That might be > Andreas's use case. Yes, this is exactly what I'm looking for. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |