sorting timeseries data.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

sorting timeseries data.

Dharhas Pothina
Hi,

I have some field data with unequal spacing and some duplicate values that I am trying to plot a spline through. It seems that the splrep and splev functions require a unique monotonically increasing series of values.

Say I have an unordered dataset with some duplicate values like :

seconds = array([1,3,2,5,4,1,6,9,8])
data = array([100, 101, 102, 103, 104, 105, 106, 107, 108])

I want to sort the data to be monotonically increasing by the variable seconds and filter out duplicate values (say by deleting the second occurrence).

I've tried combining the arrays : a = array([seconds,data]) and then using sort and argsort with various options but instead of sorting by the first column it sorts *every* column. I've also tried a=zip(x,y) followed by sort() / unique()

my final sorted array needs to look like

newseconds = array([1,1,2,3,4,5,6,8,9])
newdata = array([100,105,102,101,104,103,106,108,107])

and then removing duplicates should look like

finalseconds = array([1,2,3,4,5,6,8,9])
finaldata = array([100,102,101,104,103,106,108,107])

Any help is appreciated.

thanks

- dharhas



_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: sorting timeseries data.

Pierre GM-2
On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).

Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
 will do the trick. Look at the help for the argsort method if you need to use
a specific sorting algorithm. 'mergesort' is stable and can be preferred.

Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)

In a side note, you may want to give scikits.timeseries a try: we develop this
package specifically to handle time series (ie, series indexed in time). The
sorting part would be automatic, and finding the duplicates is also quite
easy.
HIH
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: sorting timeseries data.

Dharhas Pothina

Thanks Pierre, I'll have a look at the scikits.timeseries package when I have some time. Is it part of scipy/numpy or do I have to download it separately?

Another question with the duplicates. I have a dataset with multple datapoints on each day is there a simple way to take the maximum (or minimum,or mean) for each day and assign it to that day

ie if my data looks like

1986 10 01 16.3
1986 10 01 22.9
1986 10 01 13.2
1986 10 02 24.3
1986 10 02 22.1
1986 10 03 19.8
1986 10 03 20.1
1986 10 03 23.4
...

take the max of each day to get :

1986 10 01 22.9
1986 10 02 24.3
1986 10 03 23.4
...

thanks

- dharhas


>>> Pierre GM <[hidden email]> 5/2/2008 11:34 AM >>>
On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).

Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
 will do the trick. Look at the help for the argsort method if you need to use
a specific sorting algorithm. 'mergesort' is stable and can be preferred.

Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)

In a side note, you may want to give scikits.timeseries a try: we develop this
package specifically to handle time series (ie, series indexed in time). The
sorting part would be automatic, and finding the duplicates is also quite
easy.
HIH
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user