[SciPy-User] Determining if statistics are converged

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Determining if statistics are converged

Kathleen M Tacina
Hi,

This is slightly off-topic, but I'm not sure where a better place to ask this would be ...

I have a time series, and I'd like to check that we've taken enough points so that the mean and rms are converged.  I hoping to get help with 2 things:

(1) Good references on how to do this. 

I've been using ad hoc methods. For example, comparing the mean of the 1st n samples to the overall mean. This works well if, for example, the mean of the 1st 50 samples is the same as the mean of all 5,000 samples. But when the mean of the 1st 300 samples hasn't yet converged to the mean of all 400 samples, it isn't as helpful.

(2) Tools to help with this in the scipy ecosystem.

The application is highly turbulent flow where we expect the rms (or, equivalently, the standard deviation) to be on the same order of magnitude as the mean.

I'd also appreciate suggestions for better places to ask this question.

Thanks!

Best regards,
Kathleen
-- 

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Determining if statistics are converged

Paul Hobson-2



On Wed, Feb 19, 2014 at 10:47 AM, Kathleen Tacina <[hidden email]> wrote:
Hi,

This is slightly off-topic, but I'm not sure where a better place to ask this would be ...

I have a time series, and I'd like to check that we've taken enough points so that the mean and rms are converged.  I hoping to get help with 2 things:

(1) Good references on how to do this. 

I've been using ad hoc methods. For example, comparing the mean of the 1st n samples to the overall mean. This works well if, for example, the mean of the 1st 50 samples is the same as the mean of all 5,000 samples. But when the mean of the 1st 300 samples hasn't yet converged to the mean of all 400 samples, it isn't as helpful.

(2) Tools to help with this in the scipy ecosystem.

The application is highly turbulent flow where we expect the rms (or, equivalently, the standard deviation) to be on the same order of magnitude as the mean.

I'd also appreciate suggestions for better places to ask this question.

Thanks!

Best regards,
Kathleen

Hey Kathleen,

It seems to me that a reasonable approach would be to to simply compute the expanding mean. You could then do some rolling inspection of how much the expanding mean has changed with time.

The pandas library has great built-in support for doing this to time series data. 

Maybe something like this:

I wish pandas had be around when I was writing my hydraulic engineering master's thesis :)

-paul

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Determining if statistics are converged

Andy Fraser-6
In reply to this post by Kathleen M Tacina
I'm interested in the expert answers you get.  Although I am not an
expert, I believe that the dependent (not independent) nature of  
samples taken close together in time means that you need more samples
than you would need if they were independent.  Perhaps you need to
estimate the autocorrelations.  If so, that will get you into the big
literature on estimating the Fourier power spectrum.

Sorry.  As I write this it feels like hitting a tar baby.  Each blow
just makes things worse.

Andy

Kathleen Tacina wrote:

> Hi,
>
> This is slightly off-topic, but I'm not sure where a better place to
> ask this would be ...
>
> I have a time series, and I'd like to check that we've taken enough
> points so that the mean and rms are converged.  I hoping to get help
> with 2 things:
>
> (1) Good references on how to do this.
>
> I've been using ad hoc methods. For example, comparing the mean of the
> 1st n samples to the overall mean. This works well if, for example,
> the mean of the 1st 50 samples is the same as the mean of all 5,000
> samples. But when the mean of the 1st 300 samples hasn't yet converged
> to the mean of all 400 samples, it isn't as helpful.
>
> (2) Tools to help with this in the scipy ecosystem.
>
> The application is highly turbulent flow
> <http://en.wikipedia.org/wiki/Turbulence> where we expect the rms (or,
> equivalently, the standard deviation) to be on the same order of
> magnitude as the mean.
>
> I'd also appreciate suggestions for better places to ask this question.
>
> Thanks!
>
> Best regards,
> Kathleen
> --
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>  

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Determining if statistics are converged

Kathleen M Tacina
In reply to this post by Paul Hobson-2

On 2/19/14 2:21 PM, Paul Hobson wrote:



On Wed, Feb 19, 2014 at 10:47 AM, Kathleen Tacina <[hidden email]> wrote:
Hi,

[snip]
I have a time series, and I'd like to check that we've taken enough points so that the mean and rms are converged.  I hoping to get help with 2 things:

(1) Good references on how to do this. 

[snip]
(2) Tools to help with this in the scipy ecosystem.

The application is highly turbulent flow where we expect the rms (or, equivalently, the standard deviation) to be on the same order of magnitude as the mean.

I'd also appreciate suggestions for better places to ask this question.

Thanks!

Best regards,
Kathleen

Hey Kathleen,

It seems to me that a reasonable approach would be to to simply compute the expanding mean. You could then do some rolling inspection of how much the expanding mean has changed with time.

The pandas library has great built-in support for doing this to time series data. 

Maybe something like this:

I wish pandas had be around when I was writing my hydraulic engineering master's thesis :)

-paul


Paul,

Thanks for the link to the pandas page.  The expanding (and rolling) statistics will be very helpful -- much better than having to rewrite them myself.

I've looked at the expanding mean.  Unfortunately, for some cases, it looks like I'm not converged.  Referring to your (very helpful) ipython gist, it would be like we stopped collecting data after 5-15 sec, before statistics converge. 

Best regards,
Kathleen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user