Hi all,
I am using the TimeSeries module with scipy for several months. I am very impressive by this module. It is very easy to use. I have a problem to filter only relevant data. Sometimes, my data retrieved are not in good quality (very too high value or very too low value). I've tried to use median filter, but data could be irrelevant during a long period. I know that the pattern "strong change in a short time" is not possible normally. Then, how to filter it correctly ?
You may see the data in http://spreadsheets.google.com/pub?key=pF2qPjwUpy_1m7FbqafDXQ Thanks
Didier Rano _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
On Sunday 20 July 2008 22:56:14 didier rano wrote:
> I have a problem to filter only relevant data. Didier, That's a trick question, whose answer depends too heavily on the kind of data you're processing. * Are you expecting some kind of trend ? If yes, detrend your data first and work on the residuals. * Are you interested in finding the breaks in trend and/or regimes ? If so, I can send you some algos that do just that. * Are you interested into finding the outliers ? There are common approaches, some based on an expected normality of the data, some based on more robust methods... Googling 'outliers robust method' should get you started. _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Thanks Pierre,
What do you mean by "finding the breaks in trend and/or regimes" ?
In fact, in my case, I have several objectives:
* Draw a graph without outliers data
* Compute a trend wihout outliers data
* Determine when the behavior/trend data will change. May be "finding the breaks in trend and/or regimes " ?
Bye
Didier Rano
2008/7/21, Pierre GM <[hidden email]>:
On Sunday 20 July 2008 22:56:14 didier rano wrote:  Didier Rano [hidden email] http://www.jaxtr.com/didierrano _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
On Monday 21 July 2008 12:59:06 didier rano wrote:
> Thanks Pierre, > > What do you mean by "finding the breaks in trend and/or regimes" ? There's a lot of literature on the detection of changepoints. Roughly, a changepoint can be a step changepoint (eg., switching from one mean to another, as you observe in your data), or a trend changepoints (the slope of a linear model changes at some point), or both. www.beringclimate.noaa.gov/regimes/Regime_shift_methods_list.htm http://ams.allenpress.com/perlserv/?request=getpdf&doi=10.1175%2F2008JCLI1956.1 > In fact, in my case, I have several objectives: > * Draw a graph without outliers data Well, first you need to define what an outlier is in your problem: assuming normal data, how far away from the mean should a point be to be an outlier ? 1,2, 3 standard deviation ? What if your data is not normal ? In that case, robust methods can give good result. > * Compute a trend wihout outliers data > * Determine when the behavior/trend data will change. May be "finding the > breaks in trend and/or regimes " ? _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
2008/7/21 Pierre GM <[hidden email]>:
My data is not normal. Do you know robusts method in scipy ? Or maybe in an other python module ?
 Didier Rano [hidden email] http://www.jaxtr.com/didierrano _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
On Monday 21 July 2008 22:49:51 didier rano wrote:
> My data is not normal. Do you know robusts method in scipy ? Or maybe in an > other python module ? Mmh, I'm sure you could implement some yourself. That way, we could start another scikits. There are already some winsorization and trimming functions in scipy.stats. Alternatively, you can try to use R and numpy through rpy: http://rpy.sourceforge.net/ _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Administrator

>> My data is not normal. Do you know robusts method in scipy ? Or maybe in an
>> other python module ? > > Mmh, I'm sure you could implement some yourself. That way, we could start > another scikits. There are already some winsorization and trimming functions > in scipy.stats. > Alternatively, you can try to use R and numpy through rpy: > http://rpy.sourceforge.net/ Dider, may I ask you to give some feedback what method worked for you? I am also working with the problem of removing outliners etc. from data. Thanks in advance, Timmie _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Hi,
I haven't found yet a solution to my problem. But I am reading a good article about removing outliers: http://www.lcgceurope.com/lcgceurope/data/articlestandard//lcgceurope/502001/4509/article.pdf
Now, I need to experiment methods described in this article. Thanks Didier Rano 2008/7/22 Tim Michelsen <[hidden email]>:
 Didier Rano [hidden email] http://www.jaxtr.com/didierrano _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
I've had some success with the following:
1. Define a simple statistical model for your data. That is, from the previous data, define a distribution for the probability of the next point. 2. Define a cutoff probability separating valid data from outliers. 3. For each datum, compute its probability based on previous data, and tag it as valid or outlier. The advantage is that you can start with a simple statistical model ( for example a gaussian centered on the last valid entry ) and customize it as you find cases that are not well handled. David 2008/7/22 didier rano <[hidden email]>: > Hi, > I haven't found yet a solution to my problem. But I am reading a good > article about removing > outliers: http://www.lcgceurope.com/lcgceurope/data/articlestandard//lcgceurope/502001/4509/article.pdf > Now, I need to experiment methods described in this article. > Thanks > Didier Rano > > 2008/7/22 Tim Michelsen <[hidden email]>: >> >> >> My data is not normal. Do you know robusts method in scipy ? Or maybe >> >> in an >> >> other python module ? >> > >> > Mmh, I'm sure you could implement some yourself. That way, we could >> > start >> > another scikits. There are already some winsorization and trimming >> > functions >> > in scipy.stats. >> > Alternatively, you can try to use R and numpy through rpy: >> > http://rpy.sourceforge.net/ >> Dider, >> may I ask you to give some feedback what method worked for you? >> I am also working with the problem of removing outliners etc. from data. >> >> Thanks in advance, >> Timmie >> >> _______________________________________________ >> SciPyuser mailing list >> [hidden email] >> http://projects.scipy.org/mailman/listinfo/scipyuser > > > >  > Didier Rano > [hidden email] > http://www.jaxtr.com/didierrano > _______________________________________________ > SciPyuser mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipyuser > > SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
2008/7/23 David Huard <[hidden email]>:
> I've had some success with the following: > > 1. Define a simple statistical model for your data. That is, from the > previous data, define a distribution for the probability of the next > point. > 2. Define a cutoff probability separating valid data from outliers. > 3. For each datum, compute its probability based on previous data, and > tag it as valid or outlier. > > The advantage is that you can start with a simple statistical model ( > for example a gaussian centered on the last valid entry ) and > customize it as you find cases that are not well handled. There's also http://www.scipy.org/Cookbook/RANSAC Cheers Stéfan _______________________________________________ SciPyuser mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipyuser 
Free forum by Nabble  Edit this page 