Administrator

Hello,
I need to perform calculations for a time series that use the datetime of each data point as input. An example: def myfunction(datetime_obj, scaling_factor): pass I found out that I can get the datetime for each entry with for i in range(0, series.size): series[i] = myfunction(series.dates.tolist()[i], 10.) Now, I noticed a strange thing. If I have a base series "base_series" and assige it to a new one with new_series = base_series The base_series gets updated/changed according to all calculations I perform on new_series (Please see method 1 below). The only way I could imagine to make my code work is createding lots of template series like in method 3 below. This way lets me calculate my new values in new_series using the datetime information and still retrain base_series with its original values. I kindly ask you to shed some light why the base_series get changed when I change derived series. Is there a more efficient way to acomplish my task that I may haven't thought of so far? Thanks in advance! Kind regards, Timmie #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE #### #!/usr/bin/env python # * coding: utf8 * import datetime import scikits.timeseries as ts import numpy as np #create dummy series data = np.zeros(600)+1 now = datetime.datetime.now() start = datetime.datetime(now.year, now.month, now.day) #print start start_date = ts.Date('H', datetime=start) #print start_date series_dummy = ts.time_series(data, dtype=np.float_, freq='H', start_date=start_date) snew = series_dummy ###method 1 for i in range(0,snew.size): snew[i] = snew[i]* 2 #snew.dates[i].datetime print "method 1:", snew.sum()series_dummy.sum() ###method 2 for i in range(0,snew.size): snew = snew*2 print "method 2:", snew.sum()series_dummy.sum() #method 3: data = np.zeros(series_dummy.size)+1 dt_arr = series_dummy.dates cser = ts.time_series(data.astype(np.float_), dt_arr) for i in range(0,cser.size): # note: cser.dates[i].datetime.hour is just used as an example # my function performes calculations based on the value of the datetime of each data point for each data point (current datetime is the input parameter). cser[i] = cser.dates[i].datetime.hour print "method 3:", cser.sum()series_dummy.sum() _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Timmie,
Let's go through method #1 first: > snew = series_dummy > > ###method 1 > > for i in range(0,snew.size): > snew[i] = snew[i]* 2 #snew.dates[i].datetime Your `snew` object is only a reference to `series_dummy`. When you modify an element of snew, you're in fact modifying the corresponding element of `series_dummy`. That's a feature of Python, you would get the same result with lists: >>> a = [0,0,0] >>> b = a >>> b[0] = 1 >>> a [1,0,0] If you want to avoid that, you can make snew a copy of series_dummy snew = series_dummy.copy() Now, method #2: > > for i in range(0,snew.size): > snew = snew*2 Are you sure that's what you want to do ? you could do snew = snew*(2**snew.size) and get the same result. Anyway: here, you change what snew is at each iteration: initially, it was a reference to series_dummy, now, it's a reference to another (temporary) object, snew*2. No back propagation of results. Finally, some comments for method #3: You want to create a new timeseries based on the result of some calculation on the data part, but still using the dates of the initial series ? If you don't have any missing values, perform the computation on series._data, that'll be faster. If you have mssing values, use the series._series instead to access directly the MaskedArray methods, and not the timeseries ones (you don't want to carry the dates around if you don't need them). As a wrapup: Try to avoid looping if you can. You said a generic form of your function is: > > def myfunction(datetime_obj, scaling_factor): > pass Do you really need datetime objects ? In your example, you were using series.dates[i].datetime.hour, a list. You should have used series.dates.hour, which is an array. Using functions on an array as a whole is far more efficient than using the same functions on each element of the array. Let me know how it goes, and don't hesitate to contact me offlist if you need some help with your function. Cheers P. > > I found out that I can get the datetime for each entry with > > for i in range(0, series.size): > series[i] = myfunction(series.dates.tolist()[i], 10.) > > Now, I noticed a strange thing. > > If I have a base series "base_series" and assige it to a new one with > > new_series = base_series > > The base_series gets updated/changed according to all calculations I > perform on new_series (Please see method 1 below). > > The only way I could imagine to make my code work is createding lots > of > template series like in method 3 below. This way lets me calculate my > new values in new_series using the datetime information and still > retrain base_series with its original values. > > I kindly ask you to shed some light why the base_series get changed > when > I change derived series. > > Is there a more efficient way to acomplish my task that I may haven't > thought of so far? > > Thanks in advance! > Kind regards, > Timmie > > > > #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE #### > > #!/usr/bin/env python > # * coding: utf8 * > > import datetime > import scikits.timeseries as ts > > import numpy as np > > #create dummy series > data = np.zeros(600)+1 > now = datetime.datetime.now() > start = datetime.datetime(now.year, now.month, now.day) > #print start > start_date = ts.Date('H', datetime=start) > #print start_date > series_dummy = ts.time_series(data, dtype=np.float_, freq='H', > start_date=start_date) > > snew = series_dummy > > ###method 1 > > for i in range(0,snew.size): > snew[i] = snew[i]* 2 #snew.dates[i].datetime > > print "method 1:", snew.sum()series_dummy.sum() > > ###method 2 > > for i in range(0,snew.size): > snew = snew*2 > > print "method 2:", snew.sum()series_dummy.sum() > > #method 3: > > data = np.zeros(series_dummy.size)+1 > dt_arr = series_dummy.dates > cser = ts.time_series(data.astype(np.float_), dt_arr) > for i in range(0,cser.size): > # note: cser.dates[i].datetime.hour is just used as an example > # my function performes calculations based on the value of the > datetime of each data point for each data point (current datetime is > the > input parameter). > > cser[i] = cser.dates[i].datetime.hour > > print "method 3:", cser.sum()series_dummy.sum() > > _______________________________________________ > SciPyuser mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipyuser _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Administrator

Hello Pierre,
> first, thanks for the fast reply. I really appreciate it. As note on my last email I may add that I simplyfied the functions (method 13). The different methods were only created to illustrate how I handle/access the series. > Your `snew` object is only a reference to `series_dummy`. When you > modify an element of snew, you're in fact modifying the corresponding > element of `series_dummy`. That's a feature of Python, you would get > the same result with lists: > >>> a = [0,0,0] > >>> b = a > >>> b[0] = 1 > >>> a > [1,0,0] > If you want to avoid that, you can make snew a copy of series_dummy > snew = series_dummy.copy() > Finally, some comments for method #3: > You want to create a new timeseries based on the result of some > calculation on the data part, but still using the dates of the initial > series ? > If you don't have any missing values, perform the computation on > series._data, that'll be faster. If you have mssing values, use the > series._series instead to access directly the MaskedArray methods, and > not the timeseries ones (you don't want to carry the dates around if > you don't need them). > As a wrapup: > Try to avoid looping if you can. Yes, I noticed that. But I couldn't find another way to pass the individual datetimes to my calculation function which expects only one value at once (i.e. it is not designed to calculate full arrays). >You said a generic form of your function is: > > > > def myfunction(datetime_obj, scaling_factor): > > pass > > Do you really need datetime objects ? Yes, in geoscience/earthscience and engineering it's quite normal to have parameters which are date/your of your dependent like: position of planets, state of the ocean, etc. > In your example, you were using > series.dates[i].datetime.hour, a list. You should have used > series.dates.hour, which is an array. Using functions on an array as a > whole is far more efficient than using the same functions on each > element of the array. I will try to adjust the function in order to let it calculate the directly with array. But the basic problem I haven't solved yet is to pass a signle datetime_obj to the myfunction along with further parameters. Regards, Timmie _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Timmie,
> As note on my last email I may add that I simplyfied the functions > (method 13). > The different methods were only created to illustrate how I handle/ > access the > series. I got that. My comments were themselves intended for illustration ;) >> As a wrapup: >> Try to avoid looping if you can. > Yes, I noticed that. > But I couldn't find another way to pass the individual datetimes to my > calculation function which expects only one value at once (i.e. it > is not > designed to calculate full arrays). That might be a bottleneck. If you could modify your function so that it can process arrays, you should get better results. Of course, that depends on the actual function... When I asked whether you really needed datetime objects, I was thinking about the actual datetime.datetime objects, not about objects having, say, a `day` or `hour` property. If you send an example of function closer to your actual need, I may be able to help you more. _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Administrator

Hello Pierre,
this thingy to use the datetime information really bothers me now. >>> As a wrapup: >>> Try to avoid looping if you can. >> Yes, I noticed that. >> But I couldn't find another way to pass the individual datetimes to my >> calculation function which expects only one value at once (i.e. it >> is not >> designed to calculate full arrays). > > That might be a bottleneck. If you could modify your function so that > it can process arrays, you should get better results. Of course, that > depends on the actual function... > When I asked whether you really needed datetime objects, I was > thinking about the actual datetime.datetime objects, not about objects > having, say, a `day` or `hour` property. If you send an example of > function closer to your actual need, I may be able to help you more. Please find below my commented example. ### START ### #!/usr/bin/env python import datetime as dt import numpy as np import scikits.timeseries as ts def hoy(datetime_obj): """ calculate hour of year """ mydt = datetime_obj year = mydt.year start = dt.datetime(mydt.year, 01, 01, 0) td = mydt  start seconds = td.days * 3600 * 24 + td.seconds hours = seconds / 3600 return hours def create_ts(datetime_obj): """ create a hourly series """ data = np.arange(0,8760) startdate = ts.Date(freq='H', datetime=datetime_obj) series = ts.time_series(data, freq='H', start_date=startdate) return series ## get a datetime object my_datetime = dt.datetime.now() ## create time series myseries = create_ts(my_datetime) ## calculate hoy for datetime object my_hoy = hoy(my_datetime) print 'my_hoy:', my_hoy ## first vectorize hoy_vect = np.vectorize(hoy) ## calculate the hoy for each hour in the series # 1 method: working but workaround since the main calculation is perfomed # outside the time series object!!! array_hoy = hoy_vect(myseries.dates.tolist()) series_hoy_01 = ts.time_series(array_hoy, myseries.dates) # 2. method: desired but not working #series_hoy_02 = hoy_vect(myseries.dates) ## this fails with the error message: # # AttributeError: 'numpy.int32' object has no attribute 'year' # or # AttributeError: 'int' object has no attribute 'year' def create_dt(series): dt_vect = np.vectorize(dt.datetime) dt_ser = dt_vect(series.year, series.month, series.hour) return dt_ser ser = create_dt(myseries) series_hoy_03 = hoy_vect(dt.datetime(myseries.year, myseries.month, myseries.hour)) ### END CODE ### Thanks in advance, Timmie _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Free forum by Nabble  Edit this page 