[SciPy-User] Questions/comments about scipy.stats.mannwhitneyu

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Questions/comments about scipy.stats.mannwhitneyu

Chris Rodgers-7
Hi all

I use scipy.stats.mannwhitneyu extensively because my data is not at
all normal. I have run into a few "gotchas" with this function and I
wanted to discuss possible workarounds with the list.

1) When this function returns a significant result, it is non-trivial
to determine the direction of the effect! The Mann-Whitney test is NOT
a test on difference of medians or means, so you cannot determine the
direction from these statistics. Wikipedia has a good example of why
it is not a test for difference of median.
http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test

I've reprinted it here. The data are the finishing order of hares and
tortoises. Obviously this is contrived but it indicates the problem.
First the setup:
results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
T T T T T T T T T'.split(' ')
h = [i for i in range(len(results_l)) if results_l[i] == 'H']
t = [i for i in range(len(results_l)) if results_l[i] == 'T']

And the results:
In [12]: scipy.stats.mannwhitneyu(h, t)
Out[12]: (100.0, 0.0097565768849708391)

In [13]: np.median(h), np.median(t)
Out[13]: (19.0, 18.0)

Hares are significantly faster than tortoises, but we cannot determine
this from the output of mannwhitneyu. This could be fixed by either
returning u1 and u2 from the guts of the function, or testing them in
the function and returning the comparison. My current workaround is
testing the means which is absolutely wrong in theory but usually
correct in practice.

2) The documentation states that the sample sizes must be at least 20.
I think this is because the normal approximation for U is not valid
for smaller sample sizes. Is there a table of critical values for U in
scipy.stats that is appropriate for small sample sizes or should the
user implement his or her own?

3) This is picky but is there a reason that it returns a one-tailed
p-value, while other tests (eg ttest_*) default to two-tailed?


Thanks for any thoughts, tips, or corrections and please don't take
these comments as criticisms ... if I didn't enjoy using scipy.stats
so much I wouldn't bother bringing this up!

Chris
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
> Hi all
>
> I use scipy.stats.mannwhitneyu extensively because my data is not at
> all normal. I have run into a few "gotchas" with this function and I
> wanted to discuss possible workarounds with the list.

Can you open a ticket ? http://projects.scipy.org/scipy/report

I partially agree, but any changes won't be backwards compatible, and
I don't have time to think about this enough.

>
> 1) When this function returns a significant result, it is non-trivial
> to determine the direction of the effect! The Mann-Whitney test is NOT
> a test on difference of medians or means, so you cannot determine the
> direction from these statistics. Wikipedia has a good example of why
> it is not a test for difference of median.
> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>
> I've reprinted it here. The data are the finishing order of hares and
> tortoises. Obviously this is contrived but it indicates the problem.
> First the setup:
> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
> T T T T T T T T T'.split(' ')
> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>
> And the results:
> In [12]: scipy.stats.mannwhitneyu(h, t)
> Out[12]: (100.0, 0.0097565768849708391)
>
> In [13]: np.median(h), np.median(t)
> Out[13]: (19.0, 18.0)
>
> Hares are significantly faster than tortoises, but we cannot determine
> this from the output of mannwhitneyu. This could be fixed by either
> returning u1 and u2 from the guts of the function, or testing them in
> the function and returning the comparison. My current workaround is
> testing the means which is absolutely wrong in theory but usually
> correct in practice.

In some cases I'm reluctant to return the direction when we use a
two-sided test. In this case we don't have a one sided tests.
In analogy to ttests, I think we could return the individual u1, u2

>
> 2) The documentation states that the sample sizes must be at least 20.
> I think this is because the normal approximation for U is not valid
> for smaller sample sizes. Is there a table of critical values for U in
> scipy.stats that is appropriate for small sample sizes or should the
> user implement his or her own?

not available in scipy. I never looked at this.
pull requests for this are welcome if it works. It would be backwards
compatible.

>
> 3) This is picky but is there a reason that it returns a one-tailed
> p-value, while other tests (eg ttest_*) default to two-tailed?

legacy wart, that I don't like,  but it wasn't offending me enough to change it.

>
>
> Thanks for any thoughts, tips, or corrections and please don't take
> these comments as criticisms ... if I didn't enjoy using scipy.stats
> so much I wouldn't bother bringing this up!

Thanks for the feedback.
In large parts review of the functions relies on comments by users
(and future contributors).

The main problem is how to make changes without breaking current
usage, since many of those functions are widely used.

Josef


>
> Chris
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:

> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>> Hi all
>>
>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>> all normal. I have run into a few "gotchas" with this function and I
>> wanted to discuss possible workarounds with the list.
>
> Can you open a ticket ? http://projects.scipy.org/scipy/report
>
> I partially agree, but any changes won't be backwards compatible, and
> I don't have time to think about this enough.
>
>>
>> 1) When this function returns a significant result, it is non-trivial
>> to determine the direction of the effect! The Mann-Whitney test is NOT
>> a test on difference of medians or means, so you cannot determine the
>> direction from these statistics. Wikipedia has a good example of why
>> it is not a test for difference of median.
>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>
>> I've reprinted it here. The data are the finishing order of hares and
>> tortoises. Obviously this is contrived but it indicates the problem.
>> First the setup:
>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>> T T T T T T T T T'.split(' ')
>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>
>> And the results:
>> In [12]: scipy.stats.mannwhitneyu(h, t)
>> Out[12]: (100.0, 0.0097565768849708391)
>>
>> In [13]: np.median(h), np.median(t)
>> Out[13]: (19.0, 18.0)
>>
>> Hares are significantly faster than tortoises, but we cannot determine
>> this from the output of mannwhitneyu. This could be fixed by either
>> returning u1 and u2 from the guts of the function, or testing them in
>> the function and returning the comparison. My current workaround is
>> testing the means which is absolutely wrong in theory but usually
>> correct in practice.
>
> In some cases I'm reluctant to return the direction when we use a
> two-sided test. In this case we don't have a one sided tests.
> In analogy to ttests, I think we could return the individual u1, u2

to expand a bit:
For the Kolmogorov Smirnov test, we refused to return an indication of
the direction. The alternative is two-sided and the distribution of
the test statististic and the test statistic are different in the
one-sided test.
So we shouldn't draw any one-sided conclusions from the two-sided test.

In the t_test and mannwhitenyu the test statistic is normally
distributed (in large samples), so we can infer the one-sided test
from the two-sided statistic and p-value.

If there are tables for the small sample case, we would need to check
if we get consistent interpretation between one- and two-sided tests.

Josef

>
>>
>> 2) The documentation states that the sample sizes must be at least 20.
>> I think this is because the normal approximation for U is not valid
>> for smaller sample sizes. Is there a table of critical values for U in
>> scipy.stats that is appropriate for small sample sizes or should the
>> user implement his or her own?
>
> not available in scipy. I never looked at this.
> pull requests for this are welcome if it works. It would be backwards
> compatible.
>
>>
>> 3) This is picky but is there a reason that it returns a one-tailed
>> p-value, while other tests (eg ttest_*) default to two-tailed?
>
> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>
>>
>>
>> Thanks for any thoughts, tips, or corrections and please don't take
>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>> so much I wouldn't bother bringing this up!
>
> Thanks for the feedback.
> In large parts review of the functions relies on comments by users
> (and future contributors).
>
> The main problem is how to make changes without breaking current
> usage, since many of those functions are widely used.
>
> Josef
>
>
>>
>> Chris
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:

> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>> Hi all
>>>
>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>> all normal. I have run into a few "gotchas" with this function and I
>>> wanted to discuss possible workarounds with the list.
>>
>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>
>> I partially agree, but any changes won't be backwards compatible, and
>> I don't have time to think about this enough.
>>
>>>
>>> 1) When this function returns a significant result, it is non-trivial
>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>> a test on difference of medians or means, so you cannot determine the
>>> direction from these statistics. Wikipedia has a good example of why
>>> it is not a test for difference of median.
>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>
>>> I've reprinted it here. The data are the finishing order of hares and
>>> tortoises. Obviously this is contrived but it indicates the problem.
>>> First the setup:
>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>> T T T T T T T T T'.split(' ')
>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>
>>> And the results:
>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>> Out[12]: (100.0, 0.0097565768849708391)
>>>
>>> In [13]: np.median(h), np.median(t)
>>> Out[13]: (19.0, 18.0)
>>>
>>> Hares are significantly faster than tortoises, but we cannot determine
>>> this from the output of mannwhitneyu. This could be fixed by either
>>> returning u1 and u2 from the guts of the function, or testing them in
>>> the function and returning the comparison. My current workaround is
>>> testing the means which is absolutely wrong in theory but usually
>>> correct in practice.
>>
>> In some cases I'm reluctant to return the direction when we use a
>> two-sided test. In this case we don't have a one sided tests.
>> In analogy to ttests, I think we could return the individual u1, u2
>
> to expand a bit:
> For the Kolmogorov Smirnov test, we refused to return an indication of
> the direction. The alternative is two-sided and the distribution of
> the test statististic and the test statistic are different in the
> one-sided test.
> So we shouldn't draw any one-sided conclusions from the two-sided test.
>
> In the t_test and mannwhitenyu the test statistic is normally
> distributed (in large samples), so we can infer the one-sided test
> from the two-sided statistic and p-value.
>
> If there are tables for the small sample case, we would need to check
> if we get consistent interpretation between one- and two-sided tests.
>
> Josef
>
>>
>>>
>>> 2) The documentation states that the sample sizes must be at least 20.
>>> I think this is because the normal approximation for U is not valid
>>> for smaller sample sizes. Is there a table of critical values for U in
>>> scipy.stats that is appropriate for small sample sizes or should the
>>> user implement his or her own?
>>
>> not available in scipy. I never looked at this.
>> pull requests for this are welcome if it works. It would be backwards
>> compatible.

since I just looked at a table collection for some other test, they
also have Mann-Whitney U statistic
http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
but I didn't check if it matches the test statistic in scipy.stats

Josef

>>
>>>
>>> 3) This is picky but is there a reason that it returns a one-tailed
>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>
>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>
>>>
>>>
>>> Thanks for any thoughts, tips, or corrections and please don't take
>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>> so much I wouldn't bother bringing this up!
>>
>> Thanks for the feedback.
>> In large parts review of the functions relies on comments by users
>> (and future contributors).
>>
>> The main problem is how to make changes without breaking current
>> usage, since many of those functions are widely used.
>>
>> Josef
>>
>>
>>>
>>> Chris
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

Chris Rodgers-7
Thanks Josef. Your points make sense to me.

While we're on the subject, maybe I should ask whether this function
is even appropriate for my data. My data are Poisson-like integer
counts, and I want to know if the rate is significantly higher in
dataset1 or dataset2. I'm reluctant to use poissfit because there is a
scientific reason to believe that my data might deviate significantly
from Poisson, although I haven't checked this statistically.

Mann-whitney U seemed like a safe alternative because it doesn't make
distributional assumptions and it deals with ties, which is especially
important for me because half the counts or more can be zero. Does
that seem like a good choice, as long as I have >20 samples and the
large-sample approximation is appropriate? Comments welcome.

Thanks
Chris

On Fri, Feb 15, 2013 at 8:58 AM,  <[hidden email]> wrote:

> On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:
>> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>>> Hi all
>>>>
>>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>>> all normal. I have run into a few "gotchas" with this function and I
>>>> wanted to discuss possible workarounds with the list.
>>>
>>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>>
>>> I partially agree, but any changes won't be backwards compatible, and
>>> I don't have time to think about this enough.
>>>
>>>>
>>>> 1) When this function returns a significant result, it is non-trivial
>>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>>> a test on difference of medians or means, so you cannot determine the
>>>> direction from these statistics. Wikipedia has a good example of why
>>>> it is not a test for difference of median.
>>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>>
>>>> I've reprinted it here. The data are the finishing order of hares and
>>>> tortoises. Obviously this is contrived but it indicates the problem.
>>>> First the setup:
>>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>>> T T T T T T T T T'.split(' ')
>>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>>
>>>> And the results:
>>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>>> Out[12]: (100.0, 0.0097565768849708391)
>>>>
>>>> In [13]: np.median(h), np.median(t)
>>>> Out[13]: (19.0, 18.0)
>>>>
>>>> Hares are significantly faster than tortoises, but we cannot determine
>>>> this from the output of mannwhitneyu. This could be fixed by either
>>>> returning u1 and u2 from the guts of the function, or testing them in
>>>> the function and returning the comparison. My current workaround is
>>>> testing the means which is absolutely wrong in theory but usually
>>>> correct in practice.
>>>
>>> In some cases I'm reluctant to return the direction when we use a
>>> two-sided test. In this case we don't have a one sided tests.
>>> In analogy to ttests, I think we could return the individual u1, u2
>>
>> to expand a bit:
>> For the Kolmogorov Smirnov test, we refused to return an indication of
>> the direction. The alternative is two-sided and the distribution of
>> the test statististic and the test statistic are different in the
>> one-sided test.
>> So we shouldn't draw any one-sided conclusions from the two-sided test.
>>
>> In the t_test and mannwhitenyu the test statistic is normally
>> distributed (in large samples), so we can infer the one-sided test
>> from the two-sided statistic and p-value.
>>
>> If there are tables for the small sample case, we would need to check
>> if we get consistent interpretation between one- and two-sided tests.
>>
>> Josef
>>
>>>
>>>>
>>>> 2) The documentation states that the sample sizes must be at least 20.
>>>> I think this is because the normal approximation for U is not valid
>>>> for smaller sample sizes. Is there a table of critical values for U in
>>>> scipy.stats that is appropriate for small sample sizes or should the
>>>> user implement his or her own?
>>>
>>> not available in scipy. I never looked at this.
>>> pull requests for this are welcome if it works. It would be backwards
>>> compatible.
>
> since I just looked at a table collection for some other test, they
> also have Mann-Whitney U statistic
> http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
> but I didn't check if it matches the test statistic in scipy.stats
>
> Josef
>
>>>
>>>>
>>>> 3) This is picky but is there a reason that it returns a one-tailed
>>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>>
>>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>>
>>>>
>>>>
>>>> Thanks for any thoughts, tips, or corrections and please don't take
>>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>>> so much I wouldn't bother bringing this up!
>>>
>>> Thanks for the feedback.
>>> In large parts review of the functions relies on comments by users
>>> (and future contributors).
>>>
>>> The main problem is how to make changes without breaking current
>>> usage, since many of those functions are widely used.
>>>
>>> Josef
>>>
>>>
>>>>
>>>> Chris
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> [hidden email]
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Fri, Feb 15, 2013 at 1:44 PM, Chris Rodgers <[hidden email]> wrote:

> Thanks Josef. Your points make sense to me.
>
> While we're on the subject, maybe I should ask whether this function
> is even appropriate for my data. My data are Poisson-like integer
> counts, and I want to know if the rate is significantly higher in
> dataset1 or dataset2. I'm reluctant to use poissfit because there is a
> scientific reason to believe that my data might deviate significantly
> from Poisson, although I haven't checked this statistically.
>
> Mann-whitney U seemed like a safe alternative because it doesn't make
> distributional assumptions and it deals with ties, which is especially
> important for me because half the counts or more can be zero. Does
> that seem like a good choice, as long as I have >20 samples and the
> large-sample approximation is appropriate? Comments welcome.

Please bottom or inline post.

I don't have any direct experience with this.

The >20 samples is just a guideline (as usual). If you have many ties,
then I would expect be that you need more samples (no reference).

What I would do in cases like this is to run a small Monte Carlo, with
Poisson data, or data that looks somewhat similar to your data, to see
whether the test has the correct size (for example reject roughly 5%
at a 5% alpha), and to see whether the test has much power in small
samples.
I would expect that the size is ok, but power might not be large
unless the difference in the rate parameter is large.

Another possibility is to compare permutation p-values with asymptotic
p-values, to see whether they are close.

There should be alternative tests, but I don't think they are
available in python, specific tests for comparing count data (I have
no idea), general 2 sample goodness-of-fit test (like ks_2samp) but we
don't have anything for discrete data.

If you want to go parametric, then you could also use poisson (or
negative binomial) regression in statsmodels, and directly test the
equality of the distribution parameter. (there is also zeroinflated
poisson, but with less verification).

Josef


>
> Thanks
> Chris
>
> On Fri, Feb 15, 2013 at 8:58 AM,  <[hidden email]> wrote:
>> On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:
>>> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>>>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>>>> Hi all
>>>>>
>>>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>>>> all normal. I have run into a few "gotchas" with this function and I
>>>>> wanted to discuss possible workarounds with the list.
>>>>
>>>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>>>
>>>> I partially agree, but any changes won't be backwards compatible, and
>>>> I don't have time to think about this enough.
>>>>
>>>>>
>>>>> 1) When this function returns a significant result, it is non-trivial
>>>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>>>> a test on difference of medians or means, so you cannot determine the
>>>>> direction from these statistics. Wikipedia has a good example of why
>>>>> it is not a test for difference of median.
>>>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>>>
>>>>> I've reprinted it here. The data are the finishing order of hares and
>>>>> tortoises. Obviously this is contrived but it indicates the problem.
>>>>> First the setup:
>>>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>>>> T T T T T T T T T'.split(' ')
>>>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>>>
>>>>> And the results:
>>>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>>>> Out[12]: (100.0, 0.0097565768849708391)
>>>>>
>>>>> In [13]: np.median(h), np.median(t)
>>>>> Out[13]: (19.0, 18.0)
>>>>>
>>>>> Hares are significantly faster than tortoises, but we cannot determine
>>>>> this from the output of mannwhitneyu. This could be fixed by either
>>>>> returning u1 and u2 from the guts of the function, or testing them in
>>>>> the function and returning the comparison. My current workaround is
>>>>> testing the means which is absolutely wrong in theory but usually
>>>>> correct in practice.
>>>>
>>>> In some cases I'm reluctant to return the direction when we use a
>>>> two-sided test. In this case we don't have a one sided tests.
>>>> In analogy to ttests, I think we could return the individual u1, u2
>>>
>>> to expand a bit:
>>> For the Kolmogorov Smirnov test, we refused to return an indication of
>>> the direction. The alternative is two-sided and the distribution of
>>> the test statististic and the test statistic are different in the
>>> one-sided test.
>>> So we shouldn't draw any one-sided conclusions from the two-sided test.
>>>
>>> In the t_test and mannwhitenyu the test statistic is normally
>>> distributed (in large samples), so we can infer the one-sided test
>>> from the two-sided statistic and p-value.
>>>
>>> If there are tables for the small sample case, we would need to check
>>> if we get consistent interpretation between one- and two-sided tests.
>>>
>>> Josef
>>>
>>>>
>>>>>
>>>>> 2) The documentation states that the sample sizes must be at least 20.
>>>>> I think this is because the normal approximation for U is not valid
>>>>> for smaller sample sizes. Is there a table of critical values for U in
>>>>> scipy.stats that is appropriate for small sample sizes or should the
>>>>> user implement his or her own?
>>>>
>>>> not available in scipy. I never looked at this.
>>>> pull requests for this are welcome if it works. It would be backwards
>>>> compatible.
>>
>> since I just looked at a table collection for some other test, they
>> also have Mann-Whitney U statistic
>> http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
>> but I didn't check if it matches the test statistic in scipy.stats
>>
>> Josef
>>
>>>>
>>>>>
>>>>> 3) This is picky but is there a reason that it returns a one-tailed
>>>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>>>
>>>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>>>
>>>>>
>>>>>
>>>>> Thanks for any thoughts, tips, or corrections and please don't take
>>>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>>>> so much I wouldn't bother bringing this up!
>>>>
>>>> Thanks for the feedback.
>>>> In large parts review of the functions relies on comments by users
>>>> (and future contributors).
>>>>
>>>> The main problem is how to make changes without breaking current
>>>> usage, since many of those functions are widely used.
>>>>
>>>> Josef
>>>>
>>>>
>>>>>
>>>>> Chris
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> [hidden email]
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Sat, Feb 16, 2013 at 7:51 PM,  <[hidden email]> wrote:

> On Fri, Feb 15, 2013 at 1:44 PM, Chris Rodgers <[hidden email]> wrote:
>> Thanks Josef. Your points make sense to me.
>>
>> While we're on the subject, maybe I should ask whether this function
>> is even appropriate for my data. My data are Poisson-like integer
>> counts, and I want to know if the rate is significantly higher in
>> dataset1 or dataset2. I'm reluctant to use poissfit because there is a
>> scientific reason to believe that my data might deviate significantly
>> from Poisson, although I haven't checked this statistically.
>>
>> Mann-whitney U seemed like a safe alternative because it doesn't make
>> distributional assumptions and it deals with ties, which is especially
>> important for me because half the counts or more can be zero. Does
>> that seem like a good choice, as long as I have >20 samples and the
>> large-sample approximation is appropriate? Comments welcome.
>
> Please bottom or inline post.
>
> I don't have any direct experience with this.
>
> The >20 samples is just a guideline (as usual). If you have many ties,
> then I would expect be that you need more samples (no reference).
>
> What I would do in cases like this is to run a small Monte Carlo, with
> Poisson data, or data that looks somewhat similar to your data, to see
> whether the test has the correct size (for example reject roughly 5%
> at a 5% alpha), and to see whether the test has much power in small
> samples.
> I would expect that the size is ok, but power might not be large
> unless the difference in the rate parameter is large.

(Since I was just working on a different 2 sample test, I had this almost ready)
https://gist.github.com/josef-pkt/4969715

Even for sample size of each sample equal to 10, the results look
still pretty ok, slightly under rejecting.
with 20 observations each, size is pretty good
power is good for most lambda differences I looked at (largish).
(I only used 1000 replications)

Sometimes I'm surprised how fast we get to the asymptotics.

Josef



>
> Another possibility is to compare permutation p-values with asymptotic
> p-values, to see whether they are close.
>
> There should be alternative tests, but I don't think they are
> available in python, specific tests for comparing count data (I have
> no idea), general 2 sample goodness-of-fit test (like ks_2samp) but we
> don't have anything for discrete data.
>
> If you want to go parametric, then you could also use poisson (or
> negative binomial) regression in statsmodels, and directly test the
> equality of the distribution parameter. (there is also zeroinflated
> poisson, but with less verification).
>
> Josef
>
>
>>
>> Thanks
>> Chris
>>
>> On Fri, Feb 15, 2013 at 8:58 AM,  <[hidden email]> wrote:
>>> On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:
>>>> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>>>>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>>>>> Hi all
>>>>>>
>>>>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>>>>> all normal. I have run into a few "gotchas" with this function and I
>>>>>> wanted to discuss possible workarounds with the list.
>>>>>
>>>>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>>>>
>>>>> I partially agree, but any changes won't be backwards compatible, and
>>>>> I don't have time to think about this enough.
>>>>>
>>>>>>
>>>>>> 1) When this function returns a significant result, it is non-trivial
>>>>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>>>>> a test on difference of medians or means, so you cannot determine the
>>>>>> direction from these statistics. Wikipedia has a good example of why
>>>>>> it is not a test for difference of median.
>>>>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>>>>
>>>>>> I've reprinted it here. The data are the finishing order of hares and
>>>>>> tortoises. Obviously this is contrived but it indicates the problem.
>>>>>> First the setup:
>>>>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>>>>> T T T T T T T T T'.split(' ')
>>>>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>>>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>>>>
>>>>>> And the results:
>>>>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>>>>> Out[12]: (100.0, 0.0097565768849708391)
>>>>>>
>>>>>> In [13]: np.median(h), np.median(t)
>>>>>> Out[13]: (19.0, 18.0)
>>>>>>
>>>>>> Hares are significantly faster than tortoises, but we cannot determine
>>>>>> this from the output of mannwhitneyu. This could be fixed by either
>>>>>> returning u1 and u2 from the guts of the function, or testing them in
>>>>>> the function and returning the comparison. My current workaround is
>>>>>> testing the means which is absolutely wrong in theory but usually
>>>>>> correct in practice.
>>>>>
>>>>> In some cases I'm reluctant to return the direction when we use a
>>>>> two-sided test. In this case we don't have a one sided tests.
>>>>> In analogy to ttests, I think we could return the individual u1, u2
>>>>
>>>> to expand a bit:
>>>> For the Kolmogorov Smirnov test, we refused to return an indication of
>>>> the direction. The alternative is two-sided and the distribution of
>>>> the test statististic and the test statistic are different in the
>>>> one-sided test.
>>>> So we shouldn't draw any one-sided conclusions from the two-sided test.
>>>>
>>>> In the t_test and mannwhitenyu the test statistic is normally
>>>> distributed (in large samples), so we can infer the one-sided test
>>>> from the two-sided statistic and p-value.
>>>>
>>>> If there are tables for the small sample case, we would need to check
>>>> if we get consistent interpretation between one- and two-sided tests.
>>>>
>>>> Josef
>>>>
>>>>>
>>>>>>
>>>>>> 2) The documentation states that the sample sizes must be at least 20.
>>>>>> I think this is because the normal approximation for U is not valid
>>>>>> for smaller sample sizes. Is there a table of critical values for U in
>>>>>> scipy.stats that is appropriate for small sample sizes or should the
>>>>>> user implement his or her own?
>>>>>
>>>>> not available in scipy. I never looked at this.
>>>>> pull requests for this are welcome if it works. It would be backwards
>>>>> compatible.
>>>
>>> since I just looked at a table collection for some other test, they
>>> also have Mann-Whitney U statistic
>>> http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
>>> but I didn't check if it matches the test statistic in scipy.stats
>>>
>>> Josef
>>>
>>>>>
>>>>>>
>>>>>> 3) This is picky but is there a reason that it returns a one-tailed
>>>>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>>>>
>>>>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for any thoughts, tips, or corrections and please don't take
>>>>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>>>>> so much I wouldn't bother bringing this up!
>>>>>
>>>>> Thanks for the feedback.
>>>>> In large parts review of the functions relies on comments by users
>>>>> (and future contributors).
>>>>>
>>>>> The main problem is how to make changes without breaking current
>>>>> usage, since many of those functions are widely used.
>>>>>
>>>>> Josef
>>>>>
>>>>>
>>>>>>
>>>>>> Chris
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> [hidden email]
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

josef.pktd
On Sat, Feb 16, 2013 at 9:17 PM,  <[hidden email]> wrote:

> On Sat, Feb 16, 2013 at 7:51 PM,  <[hidden email]> wrote:
>> On Fri, Feb 15, 2013 at 1:44 PM, Chris Rodgers <[hidden email]> wrote:
>>> Thanks Josef. Your points make sense to me.
>>>
>>> While we're on the subject, maybe I should ask whether this function
>>> is even appropriate for my data. My data are Poisson-like integer
>>> counts, and I want to know if the rate is significantly higher in
>>> dataset1 or dataset2. I'm reluctant to use poissfit because there is a
>>> scientific reason to believe that my data might deviate significantly
>>> from Poisson, although I haven't checked this statistically.
>>>
>>> Mann-whitney U seemed like a safe alternative because it doesn't make
>>> distributional assumptions and it deals with ties, which is especially
>>> important for me because half the counts or more can be zero. Does
>>> that seem like a good choice, as long as I have >20 samples and the
>>> large-sample approximation is appropriate? Comments welcome.
>>
>> Please bottom or inline post.
>>
>> I don't have any direct experience with this.
>>
>> The >20 samples is just a guideline (as usual). If you have many ties,
>> then I would expect be that you need more samples (no reference).
>>
>> What I would do in cases like this is to run a small Monte Carlo, with
>> Poisson data, or data that looks somewhat similar to your data, to see
>> whether the test has the correct size (for example reject roughly 5%
>> at a 5% alpha), and to see whether the test has much power in small
>> samples.
>> I would expect that the size is ok, but power might not be large
>> unless the difference in the rate parameter is large.
>
> (Since I was just working on a different 2 sample test, I had this almost ready)
> https://gist.github.com/josef-pkt/4969715
>
> Even for sample size of each sample equal to 10, the results look
> still pretty ok, slightly under rejecting.
> with 20 observations each, size is pretty good
> power is good for most lambda differences I looked at (largish).
> (I only used 1000 replications)

with asymmetric small sample sizes
n_mc = 50000
nobs1, nobs2 = 15, 5 #20, 20
we also get a bit of under rejection, especially at small alpha (0.005 or 0.01)

(and as fun part:
plotting the histogram of the p-values shows gaps, because with ranks
not all values are possible; if I remember the interpretation
correctly.)

Josef


>
> Sometimes I'm surprised how fast we get to the asymptotics.
>
> Josef
>
>
>
>>
>> Another possibility is to compare permutation p-values with asymptotic
>> p-values, to see whether they are close.
>>
>> There should be alternative tests, but I don't think they are
>> available in python, specific tests for comparing count data (I have
>> no idea), general 2 sample goodness-of-fit test (like ks_2samp) but we
>> don't have anything for discrete data.
>>
>> If you want to go parametric, then you could also use poisson (or
>> negative binomial) regression in statsmodels, and directly test the
>> equality of the distribution parameter. (there is also zeroinflated
>> poisson, but with less verification).
>>
>> Josef
>>
>>
>>>
>>> Thanks
>>> Chris
>>>
>>> On Fri, Feb 15, 2013 at 8:58 AM,  <[hidden email]> wrote:
>>>> On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:
>>>>> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>>>>>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>>>>>> Hi all
>>>>>>>
>>>>>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>>>>>> all normal. I have run into a few "gotchas" with this function and I
>>>>>>> wanted to discuss possible workarounds with the list.
>>>>>>
>>>>>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>>>>>
>>>>>> I partially agree, but any changes won't be backwards compatible, and
>>>>>> I don't have time to think about this enough.
>>>>>>
>>>>>>>
>>>>>>> 1) When this function returns a significant result, it is non-trivial
>>>>>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>>>>>> a test on difference of medians or means, so you cannot determine the
>>>>>>> direction from these statistics. Wikipedia has a good example of why
>>>>>>> it is not a test for difference of median.
>>>>>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>>>>>
>>>>>>> I've reprinted it here. The data are the finishing order of hares and
>>>>>>> tortoises. Obviously this is contrived but it indicates the problem.
>>>>>>> First the setup:
>>>>>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>>>>>> T T T T T T T T T'.split(' ')
>>>>>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>>>>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>>>>>
>>>>>>> And the results:
>>>>>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>>>>>> Out[12]: (100.0, 0.0097565768849708391)
>>>>>>>
>>>>>>> In [13]: np.median(h), np.median(t)
>>>>>>> Out[13]: (19.0, 18.0)
>>>>>>>
>>>>>>> Hares are significantly faster than tortoises, but we cannot determine
>>>>>>> this from the output of mannwhitneyu. This could be fixed by either
>>>>>>> returning u1 and u2 from the guts of the function, or testing them in
>>>>>>> the function and returning the comparison. My current workaround is
>>>>>>> testing the means which is absolutely wrong in theory but usually
>>>>>>> correct in practice.
>>>>>>
>>>>>> In some cases I'm reluctant to return the direction when we use a
>>>>>> two-sided test. In this case we don't have a one sided tests.
>>>>>> In analogy to ttests, I think we could return the individual u1, u2
>>>>>
>>>>> to expand a bit:
>>>>> For the Kolmogorov Smirnov test, we refused to return an indication of
>>>>> the direction. The alternative is two-sided and the distribution of
>>>>> the test statististic and the test statistic are different in the
>>>>> one-sided test.
>>>>> So we shouldn't draw any one-sided conclusions from the two-sided test.
>>>>>
>>>>> In the t_test and mannwhitenyu the test statistic is normally
>>>>> distributed (in large samples), so we can infer the one-sided test
>>>>> from the two-sided statistic and p-value.
>>>>>
>>>>> If there are tables for the small sample case, we would need to check
>>>>> if we get consistent interpretation between one- and two-sided tests.
>>>>>
>>>>> Josef
>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2) The documentation states that the sample sizes must be at least 20.
>>>>>>> I think this is because the normal approximation for U is not valid
>>>>>>> for smaller sample sizes. Is there a table of critical values for U in
>>>>>>> scipy.stats that is appropriate for small sample sizes or should the
>>>>>>> user implement his or her own?
>>>>>>
>>>>>> not available in scipy. I never looked at this.
>>>>>> pull requests for this are welcome if it works. It would be backwards
>>>>>> compatible.
>>>>
>>>> since I just looked at a table collection for some other test, they
>>>> also have Mann-Whitney U statistic
>>>> http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
>>>> but I didn't check if it matches the test statistic in scipy.stats
>>>>
>>>> Josef
>>>>
>>>>>>
>>>>>>>
>>>>>>> 3) This is picky but is there a reason that it returns a one-tailed
>>>>>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>>>>>
>>>>>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks for any thoughts, tips, or corrections and please don't take
>>>>>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>>>>>> so much I wouldn't bother bringing this up!
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>> In large parts review of the functions relies on comments by users
>>>>>> (and future contributors).
>>>>>>
>>>>>> The main problem is how to make changes without breaking current
>>>>>> usage, since many of those functions are widely used.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Chris
>>>>>>> _______________________________________________
>>>>>>> SciPy-User mailing list
>>>>>>> [hidden email]
>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> [hidden email]
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Questions/comments about scipy.stats.mannwhitneyu

Chris Rodgers-7
On Sat, Feb 16, 2013 at 6:36 PM,  <[hidden email]> wrote:

> On Sat, Feb 16, 2013 at 9:17 PM,  <[hidden email]> wrote:
>> On Sat, Feb 16, 2013 at 7:51 PM,  <[hidden email]> wrote:
>>> On Fri, Feb 15, 2013 at 1:44 PM, Chris Rodgers <[hidden email]> wrote:
>>>> Thanks Josef. Your points make sense to me.
>>>>
>>>> While we're on the subject, maybe I should ask whether this function
>>>> is even appropriate for my data. My data are Poisson-like integer
>>>> counts, and I want to know if the rate is significantly higher in
>>>> dataset1 or dataset2. I'm reluctant to use poissfit because there is a
>>>> scientific reason to believe that my data might deviate significantly
>>>> from Poisson, although I haven't checked this statistically.
>>>>
>>>> Mann-whitney U seemed like a safe alternative because it doesn't make
>>>> distributional assumptions and it deals with ties, which is especially
>>>> important for me because half the counts or more can be zero. Does
>>>> that seem like a good choice, as long as I have >20 samples and the
>>>> large-sample approximation is appropriate? Comments welcome.
>>>
>>> Please bottom or inline post.
>>>
>>> I don't have any direct experience with this.
>>>
>>> The >20 samples is just a guideline (as usual). If you have many ties,
>>> then I would expect be that you need more samples (no reference).
>>>
>>> What I would do in cases like this is to run a small Monte Carlo, with
>>> Poisson data, or data that looks somewhat similar to your data, to see
>>> whether the test has the correct size (for example reject roughly 5%
>>> at a 5% alpha), and to see whether the test has much power in small
>>> samples.
>>> I would expect that the size is ok, but power might not be large
>>> unless the difference in the rate parameter is large.
>>
>> (Since I was just working on a different 2 sample test, I had this almost ready)
>> https://gist.github.com/josef-pkt/4969715
>>
>> Even for sample size of each sample equal to 10, the results look
>> still pretty ok, slightly under rejecting.
>> with 20 observations each, size is pretty good
>> power is good for most lambda differences I looked at (largish).
>> (I only used 1000 replications)
>
> with asymmetric small sample sizes
> n_mc = 50000
> nobs1, nobs2 = 15, 5 #20, 20
> we also get a bit of under rejection, especially at small alpha (0.005 or 0.01)
>
> (and as fun part:
> plotting the histogram of the p-values shows gaps, because with ranks
> not all values are possible; if I remember the interpretation
> correctly.)
>
> Josef
>
>
>>
>> Sometimes I'm surprised how fast we get to the asymptotics.
>>
>> Josef
>>
>>
>>
>>>
>>> Another possibility is to compare permutation p-values with asymptotic
>>> p-values, to see whether they are close.
>>>
>>> There should be alternative tests, but I don't think they are
>>> available in python, specific tests for comparing count data (I have
>>> no idea), general 2 sample goodness-of-fit test (like ks_2samp) but we
>>> don't have anything for discrete data.
>>>
>>> If you want to go parametric, then you could also use poisson (or
>>> negative binomial) regression in statsmodels, and directly test the
>>> equality of the distribution parameter. (there is also zeroinflated
>>> poisson, but with less verification).
>>>
>>> Josef
>>>
>>>
>>>>
>>>> Thanks
>>>> Chris
>>>>
>>>> On Fri, Feb 15, 2013 at 8:58 AM,  <[hidden email]> wrote:
>>>>> On Fri, Feb 15, 2013 at 11:35 AM,  <[hidden email]> wrote:
>>>>>> On Fri, Feb 15, 2013 at 11:16 AM,  <[hidden email]> wrote:
>>>>>>> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <[hidden email]> wrote:
>>>>>>>> Hi all
>>>>>>>>
>>>>>>>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>>>>>>>> all normal. I have run into a few "gotchas" with this function and I
>>>>>>>> wanted to discuss possible workarounds with the list.
>>>>>>>
>>>>>>> Can you open a ticket ? http://projects.scipy.org/scipy/report
>>>>>>>
>>>>>>> I partially agree, but any changes won't be backwards compatible, and
>>>>>>> I don't have time to think about this enough.
>>>>>>>
>>>>>>>>
>>>>>>>> 1) When this function returns a significant result, it is non-trivial
>>>>>>>> to determine the direction of the effect! The Mann-Whitney test is NOT
>>>>>>>> a test on difference of medians or means, so you cannot determine the
>>>>>>>> direction from these statistics. Wikipedia has a good example of why
>>>>>>>> it is not a test for difference of median.
>>>>>>>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>>>>>>>
>>>>>>>> I've reprinted it here. The data are the finishing order of hares and
>>>>>>>> tortoises. Obviously this is contrived but it indicates the problem.
>>>>>>>> First the setup:
>>>>>>>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>>>>>>>> T T T T T T T T T'.split(' ')
>>>>>>>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>>>>>>>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>>>>>>>
>>>>>>>> And the results:
>>>>>>>> In [12]: scipy.stats.mannwhitneyu(h, t)
>>>>>>>> Out[12]: (100.0, 0.0097565768849708391)
>>>>>>>>
>>>>>>>> In [13]: np.median(h), np.median(t)
>>>>>>>> Out[13]: (19.0, 18.0)
>>>>>>>>
>>>>>>>> Hares are significantly faster than tortoises, but we cannot determine
>>>>>>>> this from the output of mannwhitneyu. This could be fixed by either
>>>>>>>> returning u1 and u2 from the guts of the function, or testing them in
>>>>>>>> the function and returning the comparison. My current workaround is
>>>>>>>> testing the means which is absolutely wrong in theory but usually
>>>>>>>> correct in practice.
>>>>>>>
>>>>>>> In some cases I'm reluctant to return the direction when we use a
>>>>>>> two-sided test. In this case we don't have a one sided tests.
>>>>>>> In analogy to ttests, I think we could return the individual u1, u2
>>>>>>
>>>>>> to expand a bit:
>>>>>> For the Kolmogorov Smirnov test, we refused to return an indication of
>>>>>> the direction. The alternative is two-sided and the distribution of
>>>>>> the test statististic and the test statistic are different in the
>>>>>> one-sided test.
>>>>>> So we shouldn't draw any one-sided conclusions from the two-sided test.
>>>>>>
>>>>>> In the t_test and mannwhitenyu the test statistic is normally
>>>>>> distributed (in large samples), so we can infer the one-sided test
>>>>>> from the two-sided statistic and p-value.
>>>>>>
>>>>>> If there are tables for the small sample case, we would need to check
>>>>>> if we get consistent interpretation between one- and two-sided tests.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> 2) The documentation states that the sample sizes must be at least 20.
>>>>>>>> I think this is because the normal approximation for U is not valid
>>>>>>>> for smaller sample sizes. Is there a table of critical values for U in
>>>>>>>> scipy.stats that is appropriate for small sample sizes or should the
>>>>>>>> user implement his or her own?
>>>>>>>
>>>>>>> not available in scipy. I never looked at this.
>>>>>>> pull requests for this are welcome if it works. It would be backwards
>>>>>>> compatible.
>>>>>
>>>>> since I just looked at a table collection for some other test, they
>>>>> also have Mann-Whitney U statistic
>>>>> http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Wilcoxon/
>>>>> but I didn't check if it matches the test statistic in scipy.stats
>>>>>
>>>>> Josef
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> 3) This is picky but is there a reason that it returns a one-tailed
>>>>>>>> p-value, while other tests (eg ttest_*) default to two-tailed?
>>>>>>>
>>>>>>> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for any thoughts, tips, or corrections and please don't take
>>>>>>>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>>>>>>>> so much I wouldn't bother bringing this up!
>>>>>>>
>>>>>>> Thanks for the feedback.
>>>>>>> In large parts review of the functions relies on comments by users
>>>>>>> (and future contributors).
>>>>>>>
>>>>>>> The main problem is how to make changes without breaking current
>>>>>>> usage, since many of those functions are widely used.
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Chris
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-User mailing list
>>>>>>>> [hidden email]
>>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> [hidden email]
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> [hidden email]
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user



Thanks for checking that, and great to hear that mann-whitney works
well (or even slightly conservatively) for this use case.

To add some content besides a thanks, here is my wrapper for Python
calls to R's wilcox.test , in case it is useful for anyone out there.
https://github.com/cxrodgers/my/blob/master/stats.py

The actual call is pretty simple, but there is a lot of extra error
checking. I'm an experimentalist so a lot of my data is
ugly/incomplete compared to simulations, so I check for empty
variables or all-ties cases which will throw RuntimeErrors. I also
worry about rounding error distorting the ranks of equal floats
(again, ugly data).

The overall running time is much much slower than Scipy's
mannwhitneyu, but the 1) increased error checking; 2) avoiding the
current bug in scipy.stats.rankdata; and 3) returning the inferred
direction of the effect makes it worth it for me personally.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [SciPy-User] Questions/comments about scipy.stats.mannwhitneyu

hari
This post has NOT been accepted by the mailing list yet.
In reply to this post by Chris Rodgers-7
HI All, I am trying to do a hypothesis test on census data.

im trying to prove that years of education has an affect on salary.salary is a  defined as <=50k, >50k.

can i do this to calculate p value?


import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from scipy.stats import mannwhitneyu

print('Reading datasets...')
df_trn = pd.read_csv('adult.trn', index_col=False, skipinitialspace=True)


ALL_COLS = set(df_trn.columns)
Wanted_COLS = set(['years-of-edu', 'salary'])
Del_Cols=ALL_COLS - Wanted_COLS

new_trn1 = {}


for column in df_trn.drop(Del_Cols, axis=1).columns:
    le = LabelEncoder()
    new_trn1[column] = le.fit_transform(df_trn[column])
new_trn1
print("length of newtrn1")
print(len(new_trn1['salary']))
list1 = np.array(new_trn1['years-of-edu'])
list2 = np.array(new_trn1['salary'])


list3 = list1[list2 == 0]
list4 = list1[list2 == 1]



print('list3:', np.median(list3))
print('List4:', np.median(list4))

pp=mannwhitneyu(list3, list4)
print(pp)

it returns p value as zero.

my data set looks like