calculations using the datetime information of timeseries

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

calculations using the datetime information of timeseries

Timmie
Administrator
Hello,
I need to perform calculations for a time series that use the datetime
of each data point as input. An example:

def myfunction(datetime_obj, scaling_factor):
    pass

I found out that I can get the datetime for each entry with

for i in range(0, series.size):
        series[i] =  myfunction(series.dates.tolist()[i], 10.)

Now, I noticed a strange thing.

If I have a base series "base_series" and assige it to a new one with

new_series = base_series

The base_series gets updated/changed according to all calculations I
perform on new_series (Please see method 1 below).

The only way I could imagine to make my code work is createding lots of
template series like in method 3 below. This way lets me calculate my
new values in new_series using the datetime information and still
retrain base_series with its original values.

I kindly ask you to shed some light why the base_series get changed when
I change derived series.

Is there a more efficient way to acomplish my task that I may haven't
thought of so far?

Thanks in advance!
Kind regards,
Timmie



#### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE ####

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import datetime
import scikits.timeseries as ts

import numpy as np

#create dummy series
data = np.zeros(600)+1
now = datetime.datetime.now()
start = datetime.datetime(now.year, now.month, now.day)
#print start
start_date = ts.Date('H', datetime=start)
#print start_date
series_dummy = ts.time_series(data, dtype=np.float_, freq='H',
start_date=start_date)

snew = series_dummy

###method 1

for i in range(0,snew.size):
     snew[i] = snew[i]* 2 #snew.dates[i].datetime

print "method 1:", snew.sum()-series_dummy.sum()

###method 2

for i in range(0,snew.size):
     snew = snew*2

print "method 2:", snew.sum()-series_dummy.sum()

#method 3:

data = np.zeros(series_dummy.size)+1
dt_arr = series_dummy.dates
cser = ts.time_series(data.astype(np.float_), dt_arr)
for i in range(0,cser.size):
#        note: cser.dates[i].datetime.hour is just used as an example
#        my function performes calculations based on the value of the
datetime of each data point for each data point (current datetime is the
input parameter).

cser[i] = cser.dates[i].datetime.hour

print "method 3:", cser.sum()-series_dummy.sum()

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculations using the datetime information of timeseries

Pierre GM-2
Timmie,

Let's go through method #1 first:

> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
>     snew[i] = snew[i]* 2 #snew.dates[i].datetime


Your `snew` object is only a reference to `series_dummy`. When you  
modify an element of snew, you're in fact modifying the corresponding  
element of `series_dummy`.  That's a feature of Python, you would get  
the same result with lists:
 >>> a = [0,0,0]
 >>> b = a
 >>> b[0] = 1
 >>> a
[1,0,0]

If you want to avoid that, you can make snew a copy of series_dummy
snew = series_dummy.copy()

Now, method #2:
>
> for i in range(0,snew.size):
>     snew = snew*2

Are you sure that's what you want to do ? you could do
snew = snew*(2**snew.size)
and get the same result.
Anyway: here, you change what snew is at each iteration: initially, it  
was a reference to series_dummy, now, it's a reference to another  
(temporary) object, snew*2. No back propagation of results.

Finally, some comments for method #3:
You want to create a new timeseries based on the result of some  
calculation on the data part, but still using the dates of the initial  
series ?
If you don't have any missing values, perform the computation on  
series._data, that'll be faster. If you have mssing values, use the  
series._series instead to access directly the MaskedArray methods, and  
not the timeseries ones (you don't want to carry the dates around if  
you don't need them).

As a wrap-up:
Try to avoid looping if you can. You said a generic form of your  
function is:
>
> def myfunction(datetime_obj, scaling_factor):
>    pass

Do you really need datetime objects ? In your example, you were using  
series.dates[i].datetime.hour, a list. You should have used  
series.dates.hour, which is an array. Using functions on an array as a  
whole is far more efficient than using the same functions on each  
element of the array.


Let me know how it goes, and don't hesitate to contact me off-list if  
you need some help with your function.

Cheers
P.


>
> I found out that I can get the datetime for each entry with
>
> for i in range(0, series.size):
> series[i] =  myfunction(series.dates.tolist()[i], 10.)
>
> Now, I noticed a strange thing.
>
> If I have a base series "base_series" and assige it to a new one with
>
> new_series = base_series
>
> The base_series gets updated/changed according to all calculations I
> perform on new_series (Please see method 1 below).
>
> The only way I could imagine to make my code work is createding lots  
> of
> template series like in method 3 below. This way lets me calculate my
> new values in new_series using the datetime information and still
> retrain base_series with its original values.
>
> I kindly ask you to shed some light why the base_series get changed  
> when
> I change derived series.
>
> Is there a more efficient way to acomplish my task that I may haven't
> thought of so far?
>
> Thanks in advance!
> Kind regards,
> Timmie
>
>
>
> #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE ####
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import datetime
> import scikits.timeseries as ts
>
> import numpy as np
>
> #create dummy series
> data = np.zeros(600)+1
> now = datetime.datetime.now()
> start = datetime.datetime(now.year, now.month, now.day)
> #print start
> start_date = ts.Date('H', datetime=start)
> #print start_date
> series_dummy = ts.time_series(data, dtype=np.float_, freq='H',
> start_date=start_date)
>
> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
>     snew[i] = snew[i]* 2 #snew.dates[i].datetime
>
> print "method 1:", snew.sum()-series_dummy.sum()
>
> ###method 2
>
> for i in range(0,snew.size):
>     snew = snew*2
>
> print "method 2:", snew.sum()-series_dummy.sum()
>
> #method 3:
>
> data = np.zeros(series_dummy.size)+1
> dt_arr = series_dummy.dates
> cser = ts.time_series(data.astype(np.float_), dt_arr)
> for i in range(0,cser.size):
> #        note: cser.dates[i].datetime.hour is just used as an example
> #        my function performes calculations based on the value of the
> datetime of each data point for each data point (current datetime is  
> the
> input parameter).
>
> cser[i] = cser.dates[i].datetime.hour
>
> print "method 3:", cser.sum()-series_dummy.sum()
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculations using the datetime information of timeseries

Timmie
Administrator
Hello Pierre,
>
first, thanks for the fast reply. I really appreciate it.
As note on my last email I may add that I simplyfied the functions (method 1-3).
The different methods were only created to illustrate how I handle/access the
series.

> Your `snew` object is only a reference to `series_dummy`. When you  
> modify an element of snew, you're in fact modifying the corresponding  
> element of `series_dummy`.  That's a feature of Python, you would get  
> the same result with lists:
>  >>> a = [0,0,0]
>  >>> b = a
>  >>> b[0] = 1
>  >>> a
> [1,0,0]
> If you want to avoid that, you can make snew a copy of series_dummy
> snew = series_dummy.copy()
OK, thanks for this gentle hint. I must re-read this in my basic python books...


> Finally, some comments for method #3:
> You want to create a new timeseries based on the result of some  
> calculation on the data part, but still using the dates of the initial  
> series ?
> If you don't have any missing values, perform the computation on  
> series._data, that'll be faster. If you have mssing values, use the  
> series._series instead to access directly the MaskedArray methods, and  
> not the timeseries ones (you don't want to carry the dates around if  
> you don't need them).

 
> As a wrap-up:
> Try to avoid looping if you can.
Yes, I noticed that.
But I couldn't find another way to pass the individual datetimes to my
calculation function which expects only one value at once (i.e. it is not
designed to calculate full arrays).

>You said a generic form of your function is:
> >
> > def myfunction(datetime_obj, scaling_factor):
> >    pass
>
> Do you really need datetime objects ?
Yes, in geoscience/earthscience and engineering it's quite normal to have
parameters which are date/your of your dependent like: position of planets,
state of the ocean, etc.

> In your example, you were using  
> series.dates[i].datetime.hour, a list. You should have used  
> series.dates.hour, which is an array. Using functions on an array as a  
> whole is far more efficient than using the same functions on each  
> element of the array.
I will try to adjust the function in order to let it calculate the directly with
array.

But the basic problem I haven't solved yet is to pass a signle datetime_obj to
the myfunction along with further parameters.

Regards,
Timmie

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculations using the datetime information of timeseries

Pierre GM-2
Timmie,
> As note on my last email I may add that I simplyfied the functions  
> (method 1-3).
> The different methods were only created to illustrate how I handle/
> access the
> series.

I got that. My comments were themselves intended for illustration ;)

>> As a wrap-up:
>> Try to avoid looping if you can.
> Yes, I noticed that.
> But I couldn't find another way to pass the individual datetimes to my
> calculation function which expects only one value at once (i.e. it  
> is not
> designed to calculate full arrays).

That might be a bottleneck. If you could modify your function so that  
it can process arrays, you should get better results. Of course, that  
depends on the actual function...
When I asked whether you really needed datetime objects, I was  
thinking about the actual datetime.datetime objects, not about objects  
having, say, a `day` or `hour` property. If you send an example of  
function closer to your actual need, I may be able to help you more.
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: calculations using the datetime information of timeseries

Timmie
Administrator
Hello Pierre,
this thingy to use the datetime information really bothers me now.

>>> As a wrap-up:
>>> Try to avoid looping if you can.
>> Yes, I noticed that.
>> But I couldn't find another way to pass the individual datetimes to my
>> calculation function which expects only one value at once (i.e. it  
>> is not
>> designed to calculate full arrays).
>
> That might be a bottleneck. If you could modify your function so that  
> it can process arrays, you should get better results. Of course, that  
> depends on the actual function...
> When I asked whether you really needed datetime objects, I was  
> thinking about the actual datetime.datetime objects, not about objects  
> having, say, a `day` or `hour` property. If you send an example of  
> function closer to your actual need, I may be able to help you more.
I prepared an example. Maybe you have some ideas how to optimize the code.

Please find below my commented example.


### START ###

#!/usr/bin/env python

import datetime as dt
import numpy as np
import scikits.timeseries as ts


def hoy(datetime_obj):
     """
     calculate hour of year
     """
     mydt = datetime_obj
     year = mydt.year
     start = dt.datetime(mydt.year, 01, 01, 0)
     td = mydt - start

     seconds = td.days * 3600 * 24 + td.seconds
     hours = seconds / 3600

     return hours

def create_ts(datetime_obj):
     """
     create a hourly series
     """
     data = np.arange(0,8760)
     startdate = ts.Date(freq='H', datetime=datetime_obj)
     series = ts.time_series(data, freq='H', start_date=startdate)

     return series

## get a datetime object
my_datetime = dt.datetime.now()
## create time series
myseries = create_ts(my_datetime)
## calculate hoy for datetime object
my_hoy = hoy(my_datetime)
print 'my_hoy:', my_hoy


## first vectorize
hoy_vect = np.vectorize(hoy)

## calculate the hoy for each hour in the series


# 1 method: working but workaround since the main calculation is
perfomed
#           outside the time series object!!!
array_hoy = hoy_vect(myseries.dates.tolist())
series_hoy_01 = ts.time_series(array_hoy, myseries.dates)


# 2. method: desired but not working

#series_hoy_02 = hoy_vect(myseries.dates)
## this fails with the error message:
#
# AttributeError: 'numpy.int32' object has no attribute 'year'
# or
# AttributeError: 'int' object has no attribute 'year'

def create_dt(series):
     dt_vect = np.vectorize(dt.datetime)
     dt_ser = dt_vect(series.year, series.month, series.hour)

     return dt_ser

ser = create_dt(myseries)

series_hoy_03 = hoy_vect(dt.datetime(myseries.year, myseries.month,
                             myseries.hour))

### END CODE ###

Thanks in advance,
Timmie

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user