Statistics advise with scipy

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Statistics advise with scipy

didier rano
Hi all,

I am using the TimeSeries module with scipy for several months. I am very impressive by this module. It is very easy to use.

I have a problem to filter only relevant data. Sometimes, my data retrieved are not in good quality (very too high value or very too low value). I've tried to use median filter, but data could be irrelevant during a long period. I know that the pattern "strong change in a short time" is not possible normally. Then, how to filter it correctly ?


Thanks

Didier Rano

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

Pierre GM-2
On Sunday 20 July 2008 22:56:14 didier rano wrote:

> I have a problem to filter only relevant data.

Didier,
That's a trick question, whose answer depends too heavily on the kind of data
you're processing.
* Are you expecting some kind of trend ? If yes, detrend your data first and
work on the residuals.
* Are you interested in finding the breaks in trend and/or regimes ? If so, I
can send you some algos that do just that.
* Are you interested into finding the outliers ? There are common approaches,
some based on an expected normality of the data, some based on more robust
methods... Googling 'outliers robust method' should get you started.
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

didier rano
Thanks Pierre,
 
What do you mean by "finding the breaks in trend and/or regimes" ?
 
In fact, in my case, I have several objectives:
* Draw a graph without outliers data
* Compute a trend wihout outliers data
* Determine when the behavior/trend data will change. May be "finding the breaks in trend and/or regimes " ?
 
Bye
Didier Rano
 
2008/7/21, Pierre GM <[hidden email]>:
On Sunday 20 July 2008 22:56:14 didier rano wrote:

> I have a problem to filter only relevant data.

Didier,
That's a trick question, whose answer depends too heavily on the kind of data
you're processing.
* Are you expecting some kind of trend ? If yes, detrend your data first and
work on the residuals.
* Are you interested in finding the breaks in trend and/or regimes ? If so, I
can send you some algos that do just that.
* Are you interested into finding the outliers ? There are common approaches,
some based on an expected normality of the data, some based on more robust
methods... Googling 'outliers robust method' should get you started.
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user



--
Didier Rano
[hidden email]
http://www.jaxtr.com/didierrano
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

Pierre GM-2
On Monday 21 July 2008 12:59:06 didier rano wrote:
> Thanks Pierre,
>
> What do you mean by "finding the breaks in trend and/or regimes" ?

There's a lot of literature on the detection of change-points. Roughly, a
change-point can be a step change-point (eg., switching from one mean to
another, as you observe in your data), or a trend change-points (the slope of
a linear model changes at some point), or both.

www.beringclimate.noaa.gov/regimes/Regime_shift_methods_list.htm
http://ams.allenpress.com/perlserv/?request=get-pdf&doi=10.1175%2F2008JCLI1956.1

> In fact, in my case, I have several objectives:
> * Draw a graph without outliers data

Well, first you need to define what an outlier is in your problem: assuming
normal data, how far away from the mean should a point be to be an outlier ?
1,2, 3 standard deviation ? What if your data is not normal ? In that case,
robust methods can give good result.

> * Compute a trend wihout outliers data
> * Determine when the behavior/trend data will change. May be "finding the
> breaks in trend and/or regimes " ?
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

didier rano

2008/7/21 Pierre GM <[hidden email]>:
On Monday 21 July 2008 12:59:06 didier rano wrote:
> Thanks Pierre,
>
> What do you mean by "finding the breaks in trend and/or regimes" ?

There's a lot of literature on the detection of change-points. Roughly, a
change-point can be a step change-point (eg., switching from one mean to
another, as you observe in your data), or a trend change-points (the slope of
a linear model changes at some point), or both.

www.beringclimate.noaa.gov/regimes/Regime_shift_methods_list.htm
http://ams.allenpress.com/perlserv/?request=get-pdf&doi=10.1175%2F2008JCLI1956.1

> In fact, in my case, I have several objectives:
> * Draw a graph without outliers data

Well, first you need to define what an outlier is in your problem: assuming
normal data, how far away from the mean should a point be to be an outlier ?
1,2, 3 standard deviation ? What if your data is not normal ? In that case,
robust methods can give good result.

My data is not normal. Do you know robusts method in scipy ? Or maybe in an other python module ?
 

> * Compute a trend wihout outliers data
> * Determine when the behavior/trend data will change. May be "finding the
> breaks in trend and/or regimes " ?
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user



--
Didier Rano
[hidden email]
http://www.jaxtr.com/didierrano

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

Pierre GM-2
On Monday 21 July 2008 22:49:51 didier rano wrote:

> My data is not normal. Do you know robusts method in scipy ? Or maybe in an
> other python module ?

Mmh, I'm sure you could implement some yourself. That way, we could start
another scikits. There are already some winsorization and trimming functions
in scipy.stats.
Alternatively, you can try to use R and numpy through rpy:
http://rpy.sourceforge.net/
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

Timmie
Administrator
>> My data is not normal. Do you know robusts method in scipy ? Or maybe in an
>> other python module ?
>
> Mmh, I'm sure you could implement some yourself. That way, we could start
> another scikits. There are already some winsorization and trimming functions
> in scipy.stats.
> Alternatively, you can try to use R and numpy through rpy:
> http://rpy.sourceforge.net/
Dider,
may I ask you to give some feedback what method worked for you?
I am also working with the problem of removing outliners etc. from data.

Thanks in advance,
Timmie

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

didier rano
Hi,

I haven't found yet a solution to my problem. But I am reading a good article about removing outliers: http://www.lcgceurope.com/lcgceurope/data/articlestandard//lcgceurope/502001/4509/article.pdf
Now, I need to experiment methods described in this article.

Thanks
Didier Rano

2008/7/22 Tim Michelsen <[hidden email]>:
>> My data is not normal. Do you know robusts method in scipy ? Or maybe in an
>> other python module ?
>
> Mmh, I'm sure you could implement some yourself. That way, we could start
> another scikits. There are already some winsorization and trimming functions
> in scipy.stats.
> Alternatively, you can try to use R and numpy through rpy:
> http://rpy.sourceforge.net/
Dider,
may I ask you to give some feedback what method worked for you?
I am also working with the problem of removing outliners etc. from data.

Thanks in advance,
Timmie

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user



--
Didier Rano
[hidden email]
http://www.jaxtr.com/didierrano

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

David Huard
I've had some success with the following:

1. Define a simple statistical model for your data.  That is, from the
previous data, define a distribution for the probability of the next
point.
2. Define a cutoff probability separating valid data from outliers.
3. For each datum, compute its probability based on previous data, and
tag it as valid or outlier.

The advantage is that you can start with a simple statistical model (
for example a gaussian centered on the last valid entry ) and
customize it as you find cases that are not well handled.

David

2008/7/22 didier rano <[hidden email]>:

> Hi,
> I haven't found yet a solution to my problem. But I am reading a good
> article about removing
> outliers: http://www.lcgceurope.com/lcgceurope/data/articlestandard//lcgceurope/502001/4509/article.pdf
> Now, I need to experiment methods described in this article.
> Thanks
> Didier Rano
>
> 2008/7/22 Tim Michelsen <[hidden email]>:
>>
>> >> My data is not normal. Do you know robusts method in scipy ? Or maybe
>> >> in an
>> >> other python module ?
>> >
>> > Mmh, I'm sure you could implement some yourself. That way, we could
>> > start
>> > another scikits. There are already some winsorization and trimming
>> > functions
>> > in scipy.stats.
>> > Alternatively, you can try to use R and numpy through rpy:
>> > http://rpy.sourceforge.net/
>> Dider,
>> may I ask you to give some feedback what method worked for you?
>> I am also working with the problem of removing outliners etc. from data.
>>
>> Thanks in advance,
>> Timmie
>>
>> _______________________________________________
>> SciPy-user mailing list
>> [hidden email]
>> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>
>
> --
> Didier Rano
> [hidden email]
> http://www.jaxtr.com/didierrano
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Statistics advise with scipy

Stéfan van der Walt
2008/7/23 David Huard <[hidden email]>:

> I've had some success with the following:
>
> 1. Define a simple statistical model for your data.  That is, from the
> previous data, define a distribution for the probability of the next
> point.
> 2. Define a cutoff probability separating valid data from outliers.
> 3. For each datum, compute its probability based on previous data, and
> tag it as valid or outlier.
>
> The advantage is that you can start with a simple statistical model (
> for example a gaussian centered on the last valid entry ) and
> customize it as you find cases that are not well handled.

There's also

http://www.scipy.org/Cookbook/RANSAC

Cheers
Stéfan
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user