"small data" statistics

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

"small data" statistics

josef.pktd
Most statistical tests and statistical inference in scipy.stats and
statsmodels relies on large number assumptions.

Everyone is talking about "Big data", but is anyone still interested
in doing small sample statistics in python.

I'd like to know whether it's worth spending any time on general
purpose small sample statistics.

for example:

http://facultyweb.berry.edu/vbissonnette/statshw/doc/perm_2bs.html

```
Example homework problem:
Twenty participants were given a list of 20 words to process. The 20
participants were randomly assigned to one of two treatment
conditions. Half were instructed to count the number of vowels in each
word (shallow processing). Half were instructed to judge whether the
object described by each word would be useful if one were stranded on
a desert island (deep processing). After a brief distractor task, all
subjects were given a surprise free recall task. The number of words
correctly recalled was recorded for each subject. Here are the data:

Shallow Processing: 13 12 11 9 11 13 14 14 14 15
Deep Processing: 12 15 14 14 13 12 15 14 16 17
```

Josef
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Gael Varoquaux
On Thu, Oct 11, 2012 at 10:57:23AM -0400, [hidden email] wrote:
> Everyone is talking about "Big data", but is anyone still interested
> in doing small sample statistics in python.

I am!

> I'd like to know whether it's worth spending any time on general
> purpose small sample statistics.

It is. Big data is a buzz, but few people have big data. In addition,
what they don't realize is that it is often a small sample problem in
terms of statistics, as the number of sample is often not much bigger
than the number of features.

Thanks for all your work on scipy.stats!

Gael
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Thomas Kluyver-2
In reply to this post by josef.pktd
On 11 October 2012 15:57,  <[hidden email]> wrote:
> Everyone is talking about "Big data", but is anyone still interested
> in doing small sample statistics in python.
>
> I'd like to know whether it's worth spending any time on general
> purpose small sample statistics.

I'm certainly interested in that sort of thing - a lot of biology
still revolves around simple, 'small data' stats.

Thanks,
Thomas
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Serge Rey-2
In reply to this post by josef.pktd
On Thu, Oct 11, 2012 at 7:57 AM,  <[hidden email]> wrote:
> Most statistical tests and statistical inference in scipy.stats and
> statsmodels relies on large number assumptions.
>
> Everyone is talking about "Big data", but is anyone still interested
> in doing small sample statistics in python.

+1


--
Sergio (Serge) Rey
Professor, School of Geographical Sciences and Urban Planning
GeoDa Center for Geospatial Analysis and Computation
Arizona State University
http://geoplan.asu.edu/rey

Editor, International Regional Science Review
http://irx.sagepub.com
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Emanuele Olivetti-3
In reply to this post by josef.pktd
On 10/11/2012 04:57 PM, [hidden email] wrote:

> Most statistical tests and statistical inference in scipy.stats and
> statsmodels relies on large number assumptions.
>
> Everyone is talking about "Big data", but is anyone still interested
> in doing small sample statistics in python.
>
> I'd like to know whether it's worth spending any time on general
> purpose small sample statistics.
>
> for example:
>
> http://facultyweb.berry.edu/vbissonnette/statshw/doc/perm_2bs.html
>
> ```
> Example homework problem:
> [...]
> Shallow Processing: 13 12 11 9 11 13 14 14 14 15
> Deep Processing: 12 15 14 14 13 12 15 14 16 17
> ```

I am very interested in inference from small samples, but I have
some concerns about both the example and the proposed approach
based on the permutation test.

IMHO the question in the example at that URL, i.e. "Did the instructions
given to the participants significantly affect their level of recall?" is
not directly addressed by the permutation test. The permutation test is
related the question "how (un)likely is the collected dataset under the
assumption that the instructions did not affect the level of recall?".

In other words the initial question is about quantifying how likely is the
hypothesis "the instructions do not affect the level of recall"
(let's call it H_0) given the collected dataset, with respect to how likely is the
hypothesis "the instructions affect the level of recall" (let's call it H_1)
given the data. In a bit more formal notation the initial question is about
estimating p(H_0|data) and p(H_1|data), while the permutation test provides
a different quantity, which is related (see [0]) to p(data|H_0). Clearly
p(data|H_0) is different from p(H_0|data).
Literature on this point is for example http://dx.doi.org/10.1016/j.socec.2004.09.033

On a different side, I am also interested in understanding which are the assumptions
under which the permutation test is expected to work. I am not an expert in that
field but, as far as I know, the permutation test - and all resampling approaches
in general - requires that the sample is "representative" of the underlying
distribution of the problem. In my opinion this requirement is difficult to assess
in practice and it is even more troubling for the specific case of "small data" - of
interest for this thread.

Any comment on these points is warmly welcome.

Best,

Emanuele

[0] A minor detail: I said "related" because the outcome of the permutation test,
and of classical tests for hypothesis testing in general, is not precisely p(data|H_0).
First of all those tests rely on a statistic of the dataset and not on the dataset itself.
In the example at the URL the statistic (called "criterion" there) is the difference
between the means of the two groups. Second and more important,
the test provides an estimate of the probability of observing such a value
for the statistic... "or a more extreme one". So if we call the statistic over the
data as T(data), then the classical tests provide p(t>T(data)|H_0), and not
p(data|H_0). Anyway even p(t>T(data)|H_0) is clearly different from the initial
question, i.e. p(H_0|data).

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Sturla Molden-2
On 12.10.2012 10:36, Emanuele Olivetti wrote:

> In other words the initial question is about quantifying how likely is the
> hypothesis "the instructions do not affect the level of recall"
> (let's call it H_0) given the collected dataset, with respect to how likely is the
> hypothesis "the instructions affect the level of recall" (let's call it H_1)
> given the data. In a bit more formal notation the initial question is about
> estimating p(H_0|data) and p(H_1|data), while the permutation test provides
> a different quantity, which is related (see [0]) to p(data|H_0). Clearly
> p(data|H_0) is different from p(H_0|data).

Here you must use Bayes formula :)

p(H_0|data) is proportional to p(data|H_0) * p(H_0 a priori)

The scale factor is just a constant, so you can generate samples from
p(H_0|data) simply by using a Markov chain (e.g. Gibbs sampler) to
sample from p(data|H_0) * p(H_0 a priori).

And that is what we call "Bayesian statistics" :-)

The "classical statistics" (sometimes called "frequentist") is very
different and deals with long-run error rates you would get if the
experiment and data collection are repeated. In this framework is is
meaningless to speak about p(H_0|data) or p(H_0 a priori), because H_0
is not considered a random variable. Probabilities can only be assigned
to random variables.


The main difference from the Bayesian approach is thus that a Bayesian
consider the collected data fixed and H_0 random, whereas a frequentist
consider the data random and H_0 fixed.

To a Bayesian the data are what you got and "the universal truth about
H0" in unkown. Randomness is the uncertainty about this truth.
Probability is a measurement of the precision or knowledge about H0.
Doing the transform p * log2(p) yields the Shannon information in bits.

To a frequentist, the data are random (i.e. collecting a new set will
yield a different sample) and "the universal truth about H0" is fixed
but unknown. Randomness is the process that gives you a different data
set each time you draw a sample. It is not the uncertainty about H0.


Choosing side it is more a matter of religion than science.


Both approaches have major flaws:

* The Bayesian approach is not scale invariable. A monotonic transform
like y = f(x) can yield a different conclusion if we analyze y instead
of x. For example your null hypothesis can be true if you used a linear
scale and false if you have used a log-scale. Also, the conclusion is
dependent on your prior opinion, which can be subjective.

* The frequentist approach makes it possible to collect too much data.
If you just collect enough data, any correlation or two-sided test will
be significant. Obviously collecting more data should always give you
better information, not invariably lead to a fixed conclusion. Why do
statistics if you know the conclusion in advance?



Sturla




























_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Nathaniel Smith
In reply to this post by Emanuele Olivetti-3

On 12 Oct 2012 09:37, "Emanuele Olivetti" <[hidden email]> wrote:
>
> On 10/11/2012 04:57 PM, [hidden email] wrote:
> > Most statistical tests and statistical inference in scipy.stats and
> > statsmodels relies on large number assumptions.
> >
> > Everyone is talking about "Big data", but is anyone still interested
> > in doing small sample statistics in python.
> >
> > I'd like to know whether it's worth spending any time on general
> > purpose small sample statistics.
> >
> > for example:
> >
> > http://facultyweb.berry.edu/vbissonnette/statshw/doc/perm_2bs.html
> >
> > ```
> > Example homework problem:
> > [...]
> > Shallow Processing: 13 12 11 9 11 13 14 14 14 15
> > Deep Processing: 12 15 14 14 13 12 15 14 16 17
> > ```
>
> I am very interested in inference from small samples, but I have
> some concerns about both the example and the proposed approach
> based on the permutation test.
>
> IMHO the question in the example at that URL, i.e. "Did the instructions
> given to the participants significantly affect their level of recall?" is
> not directly addressed by the permutation test.

In this sentence, the word "significantly" is a term of art used to refer exactly to the quantity p(t>T(data)|H_0). So, yes, the permutation test addresses the original question; you just have to be familiar with the field's particular jargon to understand what they're saying. :-)

> The permutation test is
> related the question "how (un)likely is the collected dataset under the
> assumption that the instructions did not affect the level of recall?".
>
> In other words the initial question is about quantifying how likely is the
> hypothesis "the instructions do not affect the level of recall"
> (let's call it H_0) given the collected dataset, with respect to how likely is the
> hypothesis "the instructions affect the level of recall" (let's call it H_1)
> given the data. In a bit more formal notation the initial question is about
> estimating p(H_0|data) and p(H_1|data), while the permutation test provides
> a different quantity, which is related (see [0]) to p(data|H_0). Clearly
> p(data|H_0) is different from p(H_0|data).
> Literature on this point is for example http://dx.doi.org/10.1016/j.socec.2004.09.033
>
> On a different side, I am also interested in understanding which are the assumptions
> under which the permutation test is expected to work. I am not an expert in that
> field but, as far as I know, the permutation test - and all resampling approaches
> in general - requires that the sample is "representative" of the underlying
> distribution of the problem. In my opinion this requirement is difficult to assess
> in practice and it is even more troubling for the specific case of "small data" - of
> interest for this thread.

All tests require some kind of representativeness, and this isn't really a problem. The data are by definition representative (in the technical sense) of the distribution they were drawn from. (The trouble comes when you want to decide whether that distribution matches anything you care about, but looking at the data won't tell you that.) A well designed test is one that is correct on average across samples.

The alternative to a permutation test here is to make very strong assumptions about the underlying distributions (e.g. with a t test), and these assumptions are often justified only for large samples.  And, resampling tests are computationally expensive, but this is no problem for small samples. So that's why non parametrics are often better in this setting.

-n

> Any comment on these points is warmly welcome.
>
> Best,
>
> Emanuele
>
> [0] A minor detail: I said "related" because the outcome of the permutation test,
> and of classical tests for hypothesis testing in general, is not precisely p(data|H_0).
> First of all those tests rely on a statistic of the dataset and not on the dataset itself.
> In the example at the URL the statistic (called "criterion" there) is the difference
> between the means of the two groups. Second and more important,
> the test provides an estimate of the probability of observing such a value
> for the statistic... "or a more extreme one". So if we call the statistic over the
> data as T(data), then the classical tests provide p(t>T(data)|H_0), and not
> p(data|H_0). Anyway even p(t>T(data)|H_0) is clearly different from the initial
> question, i.e. p(H_0|data).
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Sturla Molden-2
In reply to this post by Sturla Molden-2
On 12.10.2012 13:12, Sturla Molden wrote:

> * The Bayesian approach is not scale invariable. A monotonic transform
> like y = f(x) can yield a different conclusion if we analyze y instead
> of x.

And this, by the way, is what really pissed off Ronald A. Fisher, the
father of the "p-value". He constructed the p-value as a heuristic for
assessing H0 specifically to avoid this issue. Ronald A. Fisher never
accepted the significance testing (type-1 and type-2 error rates) of
Pearson and Neuman, as experiments are seldom repeated. In fact the
p-value has nothing to do with significance testing.

To correct the other issues of the p-value Fisher later constructed a
different kind of analysis he called "fiuducial inference". It is not
commonly used today.

It depends on looking at hypothesis testing as signal processing:

measurement = signal + noise

The noise is considered random and and the signal is the truth about H0.
Fisher argued we can interfere the truth about H0 from subtracting the
random noise from the collected data. The method has none of the
absurdities of Bayesian and classical statistics, but for some reason it
never got popular among practitioners.


Sturla








_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Emanuele Olivetti-3
In reply to this post by Sturla Molden-2
Hi Sturla,

Thanks for the brief review of the frequentist and Bayesian differences
(I'll try to send a few comments in a future post).

The aim of my previous message was definitely more pragmatic
and it boiled down to two questions that stick with Josef's call:

1) In this thread people expressed interest in making hypothesis testing
from small samples, so is permutation test addressing the question of
the accompanying motivating example? In my opinion it is not and I hope I
provided brief but compelling motivation to support this point of view.

2) What are the assumptions under which the permutation test is
valid/acceptable (independently from the accompanying motivating example)?
I have looked around on this topic but I had just found generic desiderata for
all resampling approaches, i.e. that the sample should be "representative"
of the underlying distribution - whatever this means in practical terms.

What's your take on these two questions?
I guess it would be nice to clarify/discuss the motivating questions and the
assumptions in this thread before planning any coding.

Best,

Emanuele


On 10/12/2012 01:12 PM, Sturla Molden wrote:

> [...]
>
> The "classical statistics" (sometimes called "frequentist") is very
> different and deals with long-run error rates you would get if the
> experiment and data collection are repeated. In this framework is is
> meaningless to speak about p(H_0|data) or p(H_0 a priori), because H_0
> is not considered a random variable. Probabilities can only be assigned
> to random variables.
>
>
> [...]
>
> To a Bayesian the data are what you got and "the universal truth about
> H0" in unkown. Randomness is the uncertainty about this truth.
> Probability is a measurement of the precision or knowledge about H0.
> Doing the transform p * log2(p) yields the Shannon information in bits.
>
> [...]
> Choosing side it is more a matter of religion than science.
>
>
>

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Sturla Molden-2
On 12.10.2012 16:21, Emanuele Olivetti wrote:

> 1) In this thread people expressed interest in making hypothesis testing
> from small samples, so is permutation test addressing the question of
> the accompanying motivating example? In my opinion it is not and I hope I
> provided brief but compelling motivation to support this point of view.

For the problem Josef described, I'd analyze that as a two-sample
goodness-of-fit test against a common bin(20,p) distribution.


> 2) What are the assumptions under which the permutation test is
> valid/acceptable (independently from the accompanying motivating example)?
> I have looked around on this topic but I had just found generic desiderata for
> all resampling approaches, i.e. that the sample should be "representative"
> of the underlying distribution - whatever this means in practical terms.

Ronald A. Fisher considered the permutation test to be the "exact
procedure" the t-test should approximate. It has, in fact, all the
assumptions of the t-test.

Surprisingly many think the t-test assume normally distributed data. It
does not. If you have this idea too, forget it please.

The t-test only asserts that the large-sample "sampling distribution of
the mean" (i.e. the mean you calculate, not the data point themselves)
is a normal distribution. This is due to the central limit theorem. If
you collect enough data, the distribution of the sample mean will
converge towards a normal distribution. That is a mathematical
necessity, and can be proven to always be the case. But with small data
samples, the sampling distribution of the mean can deviate from a normal
distribution. That is when we need to use the permutation test instead.

I.e.: The t-test is an approximation to the permutation test for "large
enough" data samples.

What we mean by "large enough" is another story. We can e.g. estimate
the sampling distribution of the mean using Efron's bootstrap, and run a
goodness-of-fit test. What most practitioners do, though, is to check if
their data is approximately normally distributed. That usually signifies
a lack of understanding for the t-test. They think the data must be
normal. The data do not. But if the data are normally distributed we can
be sure the sample mean is normal as well.

So under what circumstances are the assumptions for the permutation test
not satisfied?

One notable example is the Behrens-Fisher problem! That is, you want to
compare the expectancy value of two distributions with different
variance. The permutation test does not help to solve this problem any
more than the t-test does. This is clearly a situation where
distributions matter, showing that the permutation test is not a
"distribution free" test.


Sturla






















_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Emanuele Olivetti-3
In reply to this post by Nathaniel Smith
On 10/12/2012 01:22 PM, Nathaniel Smith wrote:

On 12 Oct 2012 09:37, "Emanuele Olivetti" <[hidden email]> wrote:

> IMHO the question in the example at that URL, i.e. "Did the instructions
> given to the participants significantly affect their level of recall?" is
> not directly addressed by the permutation test.

In this sentence, the word "significantly" is a term of art used to refer exactly to the quantity p(t>T(data)|H_0). So, yes, the permutation test addresses the original question; you just have to be familiar with the field's particular jargon to understand what they're saying. :-)


Thanks Nathaniel for pointing that out. I guess I'll hardly be much familiar with
such a jargon ;-). Nevertheless while reading the example I believed
that the aim of the thought experiment was to decide among two competing
theories/hypothesis, given the results of the experiment.
But I share your point that the term "significant" turns it into a different question.

All tests require some kind of representativeness, and this isn't really a problem. The data are by definition representative (in the technical sense) of the distribution they were drawn from. (The trouble comes when you want to decide whether that distribution matches anything you care about, but looking at the data won't tell you that.) A well designed test is one that is correct on average across samples.


Indeed my wording was imprecise so thanks once more for correcting
it. Moreover you put it really well: "The trouble comes when you want to
decide whether that distribution matches anything you care about, but
looking at the data won't tell you that".
Could you tell more about evaluating the correctness of a test across
different samples? It sounds interesting.

The alternative to a permutation test here is to make very strong assumptions about the underlying distributions (e.g. with a t test), and these assumptions are often justified only for large samples.  And, resampling tests are computationally expensive, but this is no problem for small samples. So that's why non parametrics are often better in this setting.



I agree with you that strong assumptions about the underlying distributions,
e.g. parametric modeling, may raise big practical concerns. The only pro
is that at least you know the assumptions explicitly.

Best,

Emanuele


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

josef.pktd
In reply to this post by Emanuele Olivetti-3
On Fri, Oct 12, 2012 at 10:21 AM, Emanuele Olivetti
<[hidden email]> wrote:
> Hi Sturla,
>
> Thanks for the brief review of the frequentist and Bayesian differences
> (I'll try to send a few comments in a future post).
>
> The aim of my previous message was definitely more pragmatic
> and it boiled down to two questions that stick with Josef's call:

My aim is even more practical:

If everyone else has it, and it's useful, then let's do it in Python.

as for mannwhineyu this would mean tables for very small samples
exact permutation for the next higher, and random permutation
for medium sample sizes.

(and advertise empirical likelihood in statsmodels)

and for other cases (somewhere in the future) bias correction
and higher order expansions of the distribution of the test
statistics or estimates.

http://www.alglib.net/hypothesistesting/mannwhitneyu.php

(Limitation: There are too many things for "let's make it available in python".)

>
> 1) In this thread people expressed interest in making hypothesis testing
> from small samples, so is permutation test addressing the question of
> the accompanying motivating example? In my opinion it is not and I hope I
> provided brief but compelling motivation to support this point of view.

I got two questions "wrong" in the survey. And had to struggle with
several of these
http://en.wikipedia.org/wiki/P-value#Misunderstandings
(especially because I was implicitly adding "if the Null is true" to
some of the statements.)
I find the "at least one wrong answer" graph misleading compared to
the break down
by question.

Under the assumptions of the tests and the permutation distribution, I think
the permutation tests answer the question whether there are statistically
significant differences (in means, medians, distributions) across samples.
But it's in the classical statistical test tradition.

http://en.wikipedia.org/wiki/Uniformly_most_powerful_test
consistency of test, ...

>
> 2) What are the assumptions under which the permutation test is
> valid/acceptable (independently from the accompanying motivating example)?
> I have looked around on this topic but I had just found generic desiderata for
> all resampling approaches, i.e. that the sample should be "representative"
> of the underlying distribution - whatever this means in practical terms.

I collected a few papers, but haven't read them yet or only partially

https://github.com/statsmodels/statsmodels/wiki/Permutation-Tests

One problem is that all tests rely on assumptions and with small
samples there is not enough information to tests the underlying
assumptions or to switch to something that requires even
weaker assumptions and still have power.

For example my small Monte Carlo with mannwhitneyu:
Difference between permutation pvalues and large sample normal
distribution p-values is not large. I saw one recommendation that
7 observations for each sample is enough. One reference says the
extreme tail probabilities are inaccurate.

With only a few observations, the power of the test is very low and
only detects large differences.

If the distributions of the observations are symmetric and the
sample size is the same, then both permutation and normal
pvalues are correctly sized (close to 0.05 under the null) even
if the underlying distributions are different (t(2) versus normal).

If the sample sizes are unequal then differences in the
distributions, causes a bias in the test, under- or over-rejecting.

>From the references it sounds like that if the distributions are
skewed, then the tests are also incorrectly sized.


The main problem I have in terms of interpretation is that we
are in many cases not really estimating a mean or median
shift, but more likely stochastic dominance.
Under one condition the distribution has "higher" values
then under the other condition, where "higher" could mean
mean-shift or just some higher quantiles (more weight on
larger values).


Thanks for the comments.

Josef

>
> What's your take on these two questions?
> I guess it would be nice to clarify/discuss the motivating questions and the
> assumptions in this thread before planning any coding.
>
> Best,
>
> Emanuele
>
>
> On 10/12/2012 01:12 PM, Sturla Molden wrote:
>> [...]
>>
>> The "classical statistics" (sometimes called "frequentist") is very
>> different and deals with long-run error rates you would get if the
>> experiment and data collection are repeated. In this framework is is
>> meaningless to speak about p(H_0|data) or p(H_0 a priori), because H_0
>> is not considered a random variable. Probabilities can only be assigned
>> to random variables.
>>
>>
>> [...]
>>
>> To a Bayesian the data are what you got and "the universal truth about
>> H0" in unkown. Randomness is the uncertainty about this truth.
>> Probability is a measurement of the precision or knowledge about H0.
>> Doing the transform p * log2(p) yields the Shannon information in bits.
>>
>> [...]
>> Choosing side it is more a matter of religion than science.
>>
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

Nathaniel Smith
In reply to this post by Emanuele Olivetti-3
On Fri, Oct 12, 2012 at 4:27 PM, Emanuele Olivetti
<[hidden email]> wrote:

> On 10/12/2012 01:22 PM, Nathaniel Smith wrote:
>
> On 12 Oct 2012 09:37, "Emanuele Olivetti" <[hidden email]> wrote:
>
>> IMHO the question in the example at that URL, i.e. "Did the instructions
>> given to the participants significantly affect their level of recall?" is
>> not directly addressed by the permutation test.
>
> In this sentence, the word "significantly" is a term of art used to refer
> exactly to the quantity p(t>T(data)|H_0). So, yes, the permutation test
> addresses the original question; you just have to be familiar with the
> field's particular jargon to understand what they're saying. :-)
>
>
> Thanks Nathaniel for pointing that out. I guess I'll hardly be much familiar
> with
> such a jargon ;-). Nevertheless while reading the example I believed
> that the aim of the thought experiment was to decide among two competing
> theories/hypothesis, given the results of the experiment.

Well, it is, at some level. But in practice psychologists are not
simple Bayesian updaters, and in the context of their field's
practices, the way you make these decisions involves Neyman-Pearson
significance tests as one component. Of course one can debate whether
that is a good thing or not (I actually tend to fall on the side that
says it *is* a good thing), but that's getting pretty far afield of
Josef's question :-).

> But I share your point that the term "significant" turns it into a different
> question.
>
>
> All tests require some kind of representativeness, and this isn't really a
> problem. The data are by definition representative (in the technical sense)
> of the distribution they were drawn from. (The trouble comes when you want
> to decide whether that distribution matches anything you care about, but
> looking at the data won't tell you that.) A well designed test is one that
> is correct on average across samples.
>
>
> Indeed my wording was imprecise so thanks once more for correcting
> it. Moreover you put it really well: "The trouble comes when you want to
>
> decide whether that distribution matches anything you care about, but
> looking at the data won't tell you that".
> Could you tell more about evaluating the correctness of a test across
> different samples? It sounds interesting.

Well, it's a relatively simple point, actually. The definition of a
good frequentist significance test is a function f(data) which returns
a p-value, and this p-value satisfies two rules:
1) When 'data' is sampled from the null hypothesis distribution, then
f(data) is uniformly distributed between 0 and 1.
2) When 'data' is sampled from an alternative distribution of
interest, then f(data) will have a distribution that is peaked near 0.

So the point is just that you can't tell whether a given function
f(data) is well-behaved or not by looking at a single value for
'data', since the requirements for being well-behaved talk only about
the distribution of f(data) given a distribution for 'data'.

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: "small data" statistics

josef.pktd
In reply to this post by josef.pktd
On Thu, Oct 11, 2012 at 10:57 AM,  <[hidden email]> wrote:

> Most statistical tests and statistical inference in scipy.stats and
> statsmodels relies on large number assumptions.
>
> Everyone is talking about "Big data", but is anyone still interested
> in doing small sample statistics in python.
>
> I'd like to know whether it's worth spending any time on general
> purpose small sample statistics.
>
> for example:
>
> http://facultyweb.berry.edu/vbissonnette/statshw/doc/perm_2bs.html
>
> ```
> Example homework problem:
> Twenty participants were given a list of 20 words to process. The 20
> participants were randomly assigned to one of two treatment
> conditions. Half were instructed to count the number of vowels in each
> word (shallow processing). Half were instructed to judge whether the
> object described by each word would be useful if one were stranded on
> a desert island (deep processing). After a brief distractor task, all
> subjects were given a surprise free recall task. The number of words
> correctly recalled was recorded for each subject. Here are the data:
>
> Shallow Processing: 13 12 11 9 11 13 14 14 14 15
> Deep Processing: 12 15 14 14 13 12 15 14 16 17
> ```

example: R package coin
http://cran.r-project.org/web/packages/coin/vignettes/coin.pdf

found again while digging for an error in p-values in stats.wilcoxon
in the presence of ties https://github.com/scipy/scipy/pull/338
and enhancements for it.

Josef


> Josef
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user