Administrator

Hello,
I have a question on how to effectively log invalid timeseries. Such series may return may have one or more of the following properties: * duplicate dates (ts.time_series.has_duplicated_dates() ) * missing dates (ts.time_series.has_missing_dates() ) * masked values (ts.time_series.mask) The functions above in brackets return either "True" or "False" or the boolean mask array. But would be interested in the dates that my series are missing or the data points that are duplicated or masked (from input). May you give me an example how to retrieve these? I put some demo code with comments below. Example use cases: Someone sends you a data file from a datalogger or sensor recording device. * Due to battery problems, the logger did stop recording for some time (=> missing dates). It is important for inspection of the device setup to know when this happend or how long that period lasted. * The data file may have been reformatted or treated before sent to you. Due to this processing, some timsstamps have been saved twice or more (=> duplicated dates). For a correction, one would like to know where to search in the input files. * The input file has already NoData markers. They where used to mask data during loading in python (=> masked data). For error analysis the date and length of masked period is important. I would appreciate a pointer here. Regards, Timmie #### demo code: ### using the examples from http://pytseries.sourceforge.net/core/TimeSeries.html import numpy as np import scikits.timeseries as ts mlist_1 = ['2005%02i' % i for i in range(1,10)] mlist_1 += ['2006%02i' % i for i in range(2,13)] mdata_1 = np.arange(len(mlist_1)) mser_1 = ts.time_series(mdata_1, mlist_1, freq='M') mser_1.has_missing_dates() <55> True ### how do I retrieve a new series which contains only the dates that are missing? ## a series with masked mser_1_fill = mser_1.fill_missing_dates() mser_1_fill.mask # I tried "mser_1_fill.mask" but it returns the masked array. The timedate information is lost here. ### how do I retrieve a new series which contains only the dates that are masked? ### Basically it seems that I am looking for the opposite of mser_1_fill.compressed() mser_1_annual = ts.time_series(mdata_1, mlist_1, freq='A') mser_daily = mser_1.asfreq('D') ### how do I retrieve a new series which contains only the dates that are duplicated? mser_daily.has_duplicated_dates() <53> True _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Timmie,
Remember that the mask is an array of boolean and can be used for indexing. I will also assume that your data is 1D * To find the dates corresponding to the missing values in your series: >>> series.dates[series.mask] * To find the missing dates, use fill_missing_dates first (to make sure the dates are continuous) and get the missing dates by >>> series.dates[series.mask] With your example: >>> mser_1_filled = ts.fill_missing_dates(mser_1) >>> missing_dates = mser_1_filled.dates[mser_1.mask] Note that if your initial `series` has already some missing dates, you'll pick those ones up as well. you shuld then check whether you have missing values in the first place, find the corresponding dates, fill the dates, recheck the missing ones, and take the difference between the two sets. * To find duplicated dates: Things get a tad more complicated: 1. make sure that your `series` is sorted chronologically first 2. construct the following array: >>> d = series.dates >>> dupcheck = np.r_[False, (d[1:]==d[:1])] dupcheck is a ndarray of booleans with True values where the corresponding date is the same as the previous ones. Note that the first date of a duplicated series is flagged as False Gimme a few days to whip up a more useable function that would reproduce that (I think I already have something along those lines somewhere on my HD). > > Such series may return may have one or more of the following > properties: > > * duplicate dates (ts.time_series.has_duplicated_dates() ) > * missing dates (ts.time_series.has_missing_dates() ) > * masked values (ts.time_series.mask) has_duplicated_dates and has_missing_dates were not really meant to be used directly, but more internally to keep track of some info on the distribution of dates _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Free forum by Nabble  Edit this page 