Proposal for a new data analysis toolbox

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Proposal for a new data analysis toolbox

Keith Goodman
This thread started on the numpy list:
http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html

I think we should narrow the focus of the package by only including
functions that operate on numpy arrays. That would cut out date
utilities, label indexing utilities, and binary operations with
various join methods on the labels. It would leave us with three
categories: faster versions of numpy/scipy nan functions, moving
window statistics, and group functions.

I suggest we add a fourth category: normalization.

FASTER NUMPY/SCIPY NAN FUNCTIONS

This work is already underway: http://github.com/kwgoodman/nanny

The function signatures for these are easy: we copy numpy, scipy. (I
am tempted to change nanstd from scipy's bias=False to ddof=0.)

I'd like to use a partial sort for nanmedian. Anyone interested in coding that?

dtype: int32, int64, float 64 for now
ndim: 1, 2, 3 (need some recursive magic for nd > 3; that's an open
project for anyone)

MOVING WINDOW STATISTICS

I already have doc strings and unit tests
(https://github.com/kwgoodman/la/blob/master/la/farray/mov.py). And I
have a cython prototype that moves the window backwards so that the
stats can be filled in place. (This assumes we make a copy of the data
at the top of the function: arr = arr.astype(float))

Proposed function signature: mov_sum(arr, window, axis=-1),
mov_nansum(arr, window, axis=-1)

If you don't like mov, then: move? roll?

I think requesting a minimum number of non-nan elements in a window or
else returning NaN is clever. But I do like the simple signature
above.

Binary moving window functions: mov_nancorr(arr1, arr2, window, axis=-1), etc.

Optional: moving window bootstrap estimate of error (std) of the
moving statistic. So, what's the std of each erstimate in the
mov_median output? Too specialized?

dtype: float64
ndim: 1, 2, 3, recursive for nd > 0

NORMALIZATION

I already have nd versions of ranking, zscore, quantile, demean,
demedian, etc in larry. We should rename to nanzscore etc.

ranking and quantile could use some cython love.

I don't know, should we cut this category?

GROUP FUNCTIONS

Input: array, sequence of labels such as a list, axis.

For an array of shape (n,m), axis=0, and a list of n labels with d
distinct values, group_nanmean would return a (d,m) array. I'd also
like a groupfilter_nanmean which would return a (n,m) array and would
have an additional, optional input: exclude_self=False.

NAME

What should we call the package?

Numa, numerical analysis with numpy arrays
Dana, data analysis with numpy arrays

import dana as da     (da=data analysis)

ARE YOU CRAZY?

If you read this far, you are crazy and would be a good fit for this project.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

josef.pktd
On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <[hidden email]> wrote:

> This thread started on the numpy list:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>
> I think we should narrow the focus of the package by only including
> functions that operate on numpy arrays. That would cut out date
> utilities, label indexing utilities, and binary operations with
> various join methods on the labels. It would leave us with three
> categories: faster versions of numpy/scipy nan functions, moving
> window statistics, and group functions.
>
> I suggest we add a fourth category: normalization.
>
> FASTER NUMPY/SCIPY NAN FUNCTIONS
>
> This work is already underway: http://github.com/kwgoodman/nanny
>
> The function signatures for these are easy: we copy numpy, scipy. (I
> am tempted to change nanstd from scipy's bias=False to ddof=0.)

scipy.stats.nanstd is supposed to switch to ddof, so don't copy
inconsistent signatures that are supposed to be depreciated.

I would like statistics (scipy.stats and statsmodels) to stick with
default axis=0.
I would be in favor of axis=None for nan extended versions of numpy
functions and axis=0 for stats functions as defaults, but since it
will be a standalone package with wider usage, I will be able to keep
track of axis=-1.

Josef

>
> I'd like to use a partial sort for nanmedian. Anyone interested in coding that?
>
> dtype: int32, int64, float 64 for now
> ndim: 1, 2, 3 (need some recursive magic for nd > 3; that's an open
> project for anyone)
>
> MOVING WINDOW STATISTICS
>
> I already have doc strings and unit tests
> (https://github.com/kwgoodman/la/blob/master/la/farray/mov.py). And I
> have a cython prototype that moves the window backwards so that the
> stats can be filled in place. (This assumes we make a copy of the data
> at the top of the function: arr = arr.astype(float))
>
> Proposed function signature: mov_sum(arr, window, axis=-1),
> mov_nansum(arr, window, axis=-1)
>
> If you don't like mov, then: move? roll?
>
> I think requesting a minimum number of non-nan elements in a window or
> else returning NaN is clever. But I do like the simple signature
> above.
>
> Binary moving window functions: mov_nancorr(arr1, arr2, window, axis=-1), etc.
>
> Optional: moving window bootstrap estimate of error (std) of the
> moving statistic. So, what's the std of each erstimate in the
> mov_median output? Too specialized?
>
> dtype: float64
> ndim: 1, 2, 3, recursive for nd > 0
>
> NORMALIZATION
>
> I already have nd versions of ranking, zscore, quantile, demean,
> demedian, etc in larry. We should rename to nanzscore etc.
>
> ranking and quantile could use some cython love.
>
> I don't know, should we cut this category?
>
> GROUP FUNCTIONS
>
> Input: array, sequence of labels such as a list, axis.
>
> For an array of shape (n,m), axis=0, and a list of n labels with d
> distinct values, group_nanmean would return a (d,m) array. I'd also
> like a groupfilter_nanmean which would return a (n,m) array and would
> have an additional, optional input: exclude_self=False.
>
> NAME
>
> What should we call the package?
>
> Numa, numerical analysis with numpy arrays
> Dana, data analysis with numpy arrays
>
> import dana as da     (da=data analysis)
>
> ARE YOU CRAZY?
>
> If you read this far, you are crazy and would be a good fit for this project.
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Ralf Gommers-2


On Mon, Nov 22, 2010 at 11:52 PM, <[hidden email]> wrote:
On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <[hidden email]> wrote:
> This thread started on the numpy list:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>
> I think we should narrow the focus of the package by only including
> functions that operate on numpy arrays. That would cut out date
> utilities, label indexing utilities, and binary operations with
> various join methods on the labels. It would leave us with three
> categories: faster versions of numpy/scipy nan functions, moving
> window statistics, and group functions.
>
> I suggest we add a fourth category: normalization.
>
> FASTER NUMPY/SCIPY NAN FUNCTIONS
>
> This work is already underway: http://github.com/kwgoodman/nanny
>
> The function signatures for these are easy: we copy numpy, scipy. (I
> am tempted to change nanstd from scipy's bias=False to ddof=0.)

scipy.stats.nanstd is supposed to switch to ddof, so don't copy
inconsistent signatures that are supposed to be depreciated.

I added a patch for nanstd to make this switch to http://projects.scipy.org/scipy/ticket/1200 just yesterday. Unfortunately this can not be done in a backwards-compatible way. So it would be helpful to deprecate the current signature in 0.9.0 if this change is to be made.

Ralf


I would like statistics (scipy.stats and statsmodels) to stick with
default axis=0.
I would be in favor of axis=None for nan extended versions of numpy
functions and axis=0 for stats functions as defaults, but since it
will be a standalone package with wider usage, I will be able to keep
track of axis=-1.

Josef

>
> I'd like to use a partial sort for nanmedian. Anyone interested in coding that?
>
> dtype: int32, int64, float 64 for now
> ndim: 1, 2, 3 (need some recursive magic for nd > 3; that's an open
> project for anyone)
>
> MOVING WINDOW STATISTICS
>
> I already have doc strings and unit tests
> (https://github.com/kwgoodman/la/blob/master/la/farray/mov.py). And I
> have a cython prototype that moves the window backwards so that the
> stats can be filled in place. (This assumes we make a copy of the data
> at the top of the function: arr = arr.astype(float))
>
> Proposed function signature: mov_sum(arr, window, axis=-1),
> mov_nansum(arr, window, axis=-1)
>
> If you don't like mov, then: move? roll?
>
> I think requesting a minimum number of non-nan elements in a window or
> else returning NaN is clever. But I do like the simple signature
> above.
>
> Binary moving window functions: mov_nancorr(arr1, arr2, window, axis=-1), etc.
>
> Optional: moving window bootstrap estimate of error (std) of the
> moving statistic. So, what's the std of each erstimate in the
> mov_median output? Too specialized?
>
> dtype: float64
> ndim: 1, 2, 3, recursive for nd > 0
>
> NORMALIZATION
>
> I already have nd versions of ranking, zscore, quantile, demean,
> demedian, etc in larry. We should rename to nanzscore etc.
>
> ranking and quantile could use some cython love.
>
> I don't know, should we cut this category?
>
> GROUP FUNCTIONS
>
> Input: array, sequence of labels such as a list, axis.
>
> For an array of shape (n,m), axis=0, and a list of n labels with d
> distinct values, group_nanmean would return a (d,m) array. I'd also
> like a groupfilter_nanmean which would return a (n,m) array and would
> have an additional, optional input: exclude_self=False.
>
> NAME
>
> What should we call the package?
>
> Numa, numerical analysis with numpy arrays
> Dana, data analysis with numpy arrays
>
> import dana as da     (da=data analysis)
>
> ARE YOU CRAZY?
>
> If you read this far, you are crazy and would be a good fit for this project.
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by josef.pktd
On Mon, Nov 22, 2010 at 7:52 AM,  <[hidden email]> wrote:
> On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <[hidden email]> wrote:

>> The function signatures for these are easy: we copy numpy, scipy. (I
>> am tempted to change nanstd from scipy's bias=False to ddof=0.)
>
> scipy.stats.nanstd is supposed to switch to ddof, so don't copy
> inconsistent signatures that are supposed to be depreciated.

Great, I'll use ddof then.

> I would like statistics (scipy.stats and statsmodels) to stick with
> default axis=0.

I put my dates on axis=-1. It is much faster:

>> a = np.random.rand(1000,1000)
>> timeit a.sum(0)
100 loops, best of 3: 9.01 ms per loop
>> timeit a.sum(1)
1000 loops, best of 3: 1.17 ms per loop
>> timeit a.std(0)
10 loops, best of 3: 27.2 ms per loop
>> timeit a.std(1)
100 loops, best of 3: 11.5 ms per loop

But I'd like the default axis to be what a numpy user would expect it to be.

> I would be in favor of axis=None for nan extended versions of numpy
> functions and axis=0 for stats functions as defaults, but since it
> will be a standalone package with wider usage, I will be able to keep
> track of axis=-1.

What default axis would a numpy/scipy user expect for mov_sum? group_mean?
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Nathaniel Smith
In reply to this post by josef.pktd
On Mon, Nov 22, 2010 at 7:52 AM,  <[hidden email]> wrote:
> I would like statistics (scipy.stats and statsmodels) to stick with
> default axis=0.
> I would be in favor of axis=None for nan extended versions of numpy
> functions and axis=0 for stats functions as defaults, but since it
> will be a standalone package with wider usage, I will be able to keep
> track of axis=-1.

Please let's keep everything using the same default -- it doesn't
actually make life simpler if for every function I have to squint and
try to remember whether or not it's a "stats function". (Like, what's
"mean"?)

I think the world already has a sufficient supply of arbitrarily
inconsistent scientific APIs.

-- Nathaniel
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

John Hunter-4
In reply to this post by Keith Goodman
On Mon, Nov 22, 2010 at 9:35 AM, Keith Goodman <[hidden email]> wrote:
> This thread started on the numpy list:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>
> I think we should narrow the focus of the package by only including
> functions that operate on numpy arrays.

That might be overly restrictive.  What about fast incremental code
that is not array based (ie it is real time streaming rather than a
post hoc computation on arrays).  Eg, a cython ringbuffer with support
for nan, percentiles, min, max, mean, std, median, etc....  Eric
Firing wrote a ringbuf class that provides this functionality that is
very useful, and this packages seems like a perfect place to host
something like that.

JDH
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by Nathaniel Smith
On Mon, Nov 22, 2010 at 8:14 AM, Nathaniel Smith <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 7:52 AM,  <[hidden email]> wrote:
>> I would like statistics (scipy.stats and statsmodels) to stick with
>> default axis=0.
>> I would be in favor of axis=None for nan extended versions of numpy
>> functions and axis=0 for stats functions as defaults, but since it
>> will be a standalone package with wider usage, I will be able to keep
>> track of axis=-1.
>
> Please let's keep everything using the same default -- it doesn't
> actually make life simpler if for every function I have to squint and
> try to remember whether or not it's a "stats function". (Like, what's
> "mean"?)
>
> I think the world already has a sufficient supply of arbitrarily
> inconsistent scientific APIs.

nanstd, nanmean, etc use axis=None for the default. What would
axis=None mean for a moving window sum?
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by John Hunter-4
On Mon, Nov 22, 2010 at 8:16 AM, John Hunter <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 9:35 AM, Keith Goodman <[hidden email]> wrote:
>> This thread started on the numpy list:
>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>>
>> I think we should narrow the focus of the package by only including
>> functions that operate on numpy arrays.
>
> That might be overly restrictive.  What about fast incremental code
> that is not array based (ie it is real time streaming rather than a
> post hoc computation on arrays).  Eg, a cython ringbuffer with support
> for nan, percentiles, min, max, mean, std, median, etc....  Eric
> Firing wrote a ringbuf class that provides this functionality that is
> very useful, and this packages seems like a perfect place to host
> something like that.

That's a new idea to me. My first reaction is that it belongs in a
separate package for streaming data. Large packages get tough to
maintain and to use. What do others think?
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

dhirschfeld
In reply to this post by Keith Goodman
Keith Goodman <kwgoodman <at> gmail.com> writes:

>
> NAME
>
> What should we call the package?
>
> Numa, numerical analysis with numpy arrays
> Dana, data analysis with numpy arrays
>
> import dana as da     (da=data analysis)
>
> ARE YOU CRAZY?
>
> If you read this far, you are crazy and would be a good fit for this project.
>

Sounds like a useful toolbox. As it's focused on calculating various statistics
on arrays in the presence of NaNs I would find nanstats an informative (if
boring) name.

-Dave



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

josef.pktd
In reply to this post by Keith Goodman
On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <[hidden email]> wrote:
> This thread started on the numpy list:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>
> I think we should narrow the focus of the package by only including
> functions that operate on numpy arrays. That would cut out date
> utilities, label indexing utilities, and binary operations with
> various join methods on the labels. It would leave us with three
> categories: faster versions of numpy/scipy nan functions, moving
> window statistics, and group functions.

Returning back to the integer questions:

It would be nice to have nan handling for integer arrays with a user
defined nan, e.g. -9999.
That would allow faster operations or avoid having to use floats.

Josef
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by dhirschfeld
On Mon, Nov 22, 2010 at 8:52 AM, Dave Hirschfeld
<[hidden email]> wrote:

> Keith Goodman <kwgoodman <at> gmail.com> writes:
>>
>> NAME
>>
>> What should we call the package?
>>
>> Numa, numerical analysis with numpy arrays
>> Dana, data analysis with numpy arrays
>>
>> import dana as da     (da=data analysis)
>>
>> ARE YOU CRAZY?
>>
>> If you read this far, you are crazy and would be a good fit for this project.
>>
>
> Sounds like a useful toolbox. As it's focused on calculating various statistics
> on arrays in the presence of NaNs I would find nanstats an informative (if
> boring) name.

I like the idea of narrowing the focus to NaNs. Then maybe we could
drop the nan prefix from the function names. So std instead of nanstd.
How about Nancy (NAN + CYthon)? But nanstats is more descriptive.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Nathaniel Smith
In reply to this post by Keith Goodman
On Mon, Nov 22, 2010 at 8:22 AM, Keith Goodman <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 8:14 AM, Nathaniel Smith <[hidden email]> wrote:
>> On Mon, Nov 22, 2010 at 7:52 AM,  <[hidden email]> wrote:
>>> I would like statistics (scipy.stats and statsmodels) to stick with
>>> default axis=0.
>>> I would be in favor of axis=None for nan extended versions of numpy
>>> functions and axis=0 for stats functions as defaults, but since it
>>> will be a standalone package with wider usage, I will be able to keep
>>> track of axis=-1.
>>
>> Please let's keep everything using the same default -- it doesn't
>> actually make life simpler if for every function I have to squint and
>> try to remember whether or not it's a "stats function". (Like, what's
>> "mean"?)
>>
>> I think the world already has a sufficient supply of arbitrarily
>> inconsistent scientific APIs.
>
> nanstd, nanmean, etc use axis=None for the default.

Great -- I understood Josef as arguing that they shouldn't.

>What would
> axis=None mean for a moving window sum?

Well, the same as mov_sum(arr.ravel()), I suppose. Probably not very
useful for multidimensional arrays, but I'm not sure there's a better
default.

-- Nathaniel
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Darren Dale
In reply to this post by Keith Goodman
On Mon, Nov 22, 2010 at 12:10 PM, Keith Goodman <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 8:52 AM, Dave Hirschfeld
> <[hidden email]> wrote:
>> Keith Goodman <kwgoodman <at> gmail.com> writes:
>>>
>>> NAME
>>>
>>> What should we call the package?
>>>
>>> Numa, numerical analysis with numpy arrays
>>> Dana, data analysis with numpy arrays
>>>
>>> import dana as da     (da=data analysis)
>>>
>>> ARE YOU CRAZY?
>>>
>>> If you read this far, you are crazy and would be a good fit for this project.
>>>
>>
>> Sounds like a useful toolbox. As it's focused on calculating various statistics
>> on arrays in the presence of NaNs I would find nanstats an informative (if
>> boring) name.
>
> I like the idea of narrowing the focus to NaNs. Then maybe we could
> drop the nan prefix from the function names. So std instead of nanstd.
> How about Nancy (NAN + CYthon)?

The devs could be known as nancy-boys. (sorry, I couldn't help myself.)
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by Nathaniel Smith
On Mon, Nov 22, 2010 at 9:28 AM, Nathaniel Smith <[hidden email]> wrote:
> On Mon, Nov 22, 2010 at 8:22 AM, Keith Goodman <[hidden email]> wrote:

>>What would axis=None mean for a moving window sum?
>
> Well, the same as mov_sum(arr.ravel()), I suppose. Probably not very
> useful for multidimensional arrays, but I'm not sure there's a better
> default.

I guess the choices for the default axis for moving statistics are 0,
-1, None. I'd throw out None and then pick either 0 or -1.

For group_mean I think axis=0 makes more sense. Wes and Josef prefer
axis=0, I think. I'm fine with that but would like to hear more
opinions.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

josef.pktd
In reply to this post by Nathaniel Smith
On Mon, Nov 22, 2010 at 12:28 PM, Nathaniel Smith <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 8:22 AM, Keith Goodman <[hidden email]> wrote:
>> On Mon, Nov 22, 2010 at 8:14 AM, Nathaniel Smith <[hidden email]> wrote:
>>> On Mon, Nov 22, 2010 at 7:52 AM,  <[hidden email]> wrote:
>>>> I would like statistics (scipy.stats and statsmodels) to stick with
>>>> default axis=0.
>>>> I would be in favor of axis=None for nan extended versions of numpy
>>>> functions and axis=0 for stats functions as defaults, but since it
>>>> will be a standalone package with wider usage, I will be able to keep
>>>> track of axis=-1.
>>>
>>> Please let's keep everything using the same default -- it doesn't
>>> actually make life simpler if for every function I have to squint and
>>> try to remember whether or not it's a "stats function". (Like, what's
>>> "mean"?)
>>>
>>> I think the world already has a sufficient supply of arbitrarily
>>> inconsistent scientific APIs.
>>
>> nanstd, nanmean, etc use axis=None for the default.
>
> Great -- I understood Josef as arguing that they shouldn't.

I think nanmean, nanvar, nanstd, nanmax should belong in numpy and
follow numpy convention.

But when I import scipy.stats, I expect axis=0 as default, especially
for statistical tests, and similar, where I usually assume we have
observation in rows and variables in columns as in structured arrays
or record arrays.

np.cov, np.corrcoef usually throw me off, and I am surprised if it
prints a 1000x1000 array instead of 4x4. I have a hard time
remembering rowvar=1. I would prefer axis=0 or axis=1 for correlations
and covariances.

So it's mainly a question about the default when axis=None doesn't
make much sense.

Josef

>
>>What would
>> axis=None mean for a moving window sum?
>
> Well, the same as mov_sum(arr.ravel()), I suppose. Probably not very
> useful for multidimensional arrays, but I'm not sure there's a better
> default.
>
> -- Nathaniel
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Alan G Isaac-2
In reply to this post by Keith Goodman
On 11/22/2010 12:47 PM, Keith Goodman wrote:
> For group_mean I think axis=0 makes more sense. Wes and Josef prefer
> axis=0, I think. I'm fine with that but would like to hear more
> opinions.


I'd prefer the following.

1. Whenever the operation can sensibly be applied to a 1d array,
make the default: axis=None.

2. If the operation cannot sensibly be applied to a 1d array,
provide no default.  (I.e., force axis specification.)

In other words: remove guessing by the user.

Alan Isaac

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
On Mon, Nov 22, 2010 at 1:32 PM, Alan G Isaac <[hidden email]> wrote:

> On 11/22/2010 12:47 PM, Keith Goodman wrote:
>> For group_mean I think axis=0 makes more sense. Wes and Josef prefer
>> axis=0, I think. I'm fine with that but would like to hear more
>> opinions.
>
>
> I'd prefer the following.
>
> 1. Whenever the operation can sensibly be applied to a 1d array,
> make the default: axis=None.
>
> 2. If the operation cannot sensibly be applied to a 1d array,
> provide no default.  (I.e., force axis specification.)
>
> In other words: remove guessing by the user.

I like it. Cleaner.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Keith Goodman
In reply to this post by Keith Goodman
On Mon, Nov 22, 2010 at 7:35 AM, Keith Goodman <[hidden email]> wrote:
> This thread started on the numpy list:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html

Based on the feedback I got on the scipy and numpy lists, I expanded
the focus of the Nanny project from A to B, where

A = Faster, drop-in replacement of the NaN functions in Numpy and Scipy
B = Fast, NaN-aware descriptive statistics of NumPy arrays

I also renamed the project from Nanny to dsna (descriptive statistics
of numpy arrays) and dropped the nan prefix from all function names
(the package is simpler if all functions are NaN aware). A description
of the project can be found in the readme file here:

http://github.com/kwgoodman/dsna
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Sebastian Haase-3
On Tue, Nov 23, 2010 at 8:23 PM, Keith Goodman <[hidden email]> wrote:

> On Mon, Nov 22, 2010 at 7:35 AM, Keith Goodman <[hidden email]> wrote:
>> This thread started on the numpy list:
>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>
> Based on the feedback I got on the scipy and numpy lists, I expanded
> the focus of the Nanny project from A to B, where
>
> A = Faster, drop-in replacement of the NaN functions in Numpy and Scipy
> B = Fast, NaN-aware descriptive statistics of NumPy arrays
>
> I also renamed the project from Nanny to dsna (descriptive statistics
> of numpy arrays) and dropped the nan prefix from all function names
> (the package is simpler if all functions are NaN aware). A description
> of the project can be found in the readme file here:
>
> http://github.com/kwgoodman/dsna

Nanny did have the advantage of being "catchy" - and easy to remember... !
no chance of remembering a 4 ("random") letter sequence....
If you want to change the name, I suggest including the idea of
speed/cython/.. or so -- wasn't that the original idea ....

- Sebastian Haase
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for a new data analysis toolbox

Matthew Brett
On Tue, Nov 23, 2010 at 1:09 PM, Sebastian Haase <[hidden email]> wrote:

> On Tue, Nov 23, 2010 at 8:23 PM, Keith Goodman <[hidden email]> wrote:
>> On Mon, Nov 22, 2010 at 7:35 AM, Keith Goodman <[hidden email]> wrote:
>>> This thread started on the numpy list:
>>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>>
>> Based on the feedback I got on the scipy and numpy lists, I expanded
>> the focus of the Nanny project from A to B, where
>>
>> A = Faster, drop-in replacement of the NaN functions in Numpy and Scipy
>> B = Fast, NaN-aware descriptive statistics of NumPy arrays
>>
>> I also renamed the project from Nanny to dsna (descriptive statistics
>> of numpy arrays) and dropped the nan prefix from all function names
>> (the package is simpler if all functions are NaN aware). A description
>> of the project can be found in the readme file here:
>>
>> http://github.com/kwgoodman/dsna
>
> Nanny did have the advantage of being "catchy" - and easy to remember... !
> no chance of remembering a 4 ("random") letter sequence....
> If you want to change the name, I suggest including the idea of
> speed/cython/.. or so -- wasn't that the original idea ....

"disnay" maybe?
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
123