Re: creating timeseries for non convertional custom frequencies

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: creating timeseries for non convertional custom frequencies

Marco Tuckner
Hello Pierre, Matt and others!

The thing you suggested worked and gave the result that I wanted to achieve.
The crutial thing was -- as Pierre write -- the filling of the missing dates:
timeseries.fill_missing_dates(series)

But now I have kind of 'two different' masks:

(1) One mask that I created when importing the data or creating the masked
array. This is used to mask all data values are physically inplausible or
invalid.
(2) Another mask that I just created with fill_missing_dates to get the missing
dates filled.

You'd say that this is fine.
I now want continue to mask invalid data with filters (e.g. discard x lower 5
AND higher 100). And many more filters in between. In the end I would like to
count the
all masked data points to get a feeling of the performance of my logging device
or the measurement process as a whole. When I now count all masked values the
result would include those data points masked in stage (2). This would
signifcantly reduce the accuracy of my data recovery ratio:
number of valid data points / number of expected data points.

Any suggestion who I can get around this?

BTW, Is there a more efficient way to get properties of the masked array like
number of masked and not masked values?

I tried this:
# return the number of masked values
number_of_valid_values = filled.mask.size-sum(filled.mask)
#return number of False values in a masked array
number_of_valid_values = filled.mask.size-filled.mask.size-sum(filled.mask)

Greetings,
Marco

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: creating timeseries for non convertional custom frequencies

Pierre GM-2
Marco,

> (1) One mask that I created when importing the data or creating the masked
> array. This is used to mask all data values are physically inplausible or
> invalid.
> (2) Another mask that I just created with fill_missing_dates to get the
> missing dates filled.

Just count the number of unmasked data with series.count(), and store it into
a count_ini variable.
Then, keep on applying your filters, counting the number of unmasked data each
time. You can then compare this new counts to count_ini (the original one).

> BTW, Is there a more efficient way to get properties of the masked array
> like number of masked and not masked values?

If you look at the source code for the count method (in numpy.ma), you'll see
that the result of count is only the difference between the size along the
given axis and the sum of the mask along the same axis:
ma.count(s, axis) = numpy.size(s._data, axis) - numpy.sum(s._mask, axis)

So, the nb of "valid" values is given by series.count(axis), the nb
of "invalid" values by series._mask.sum(axis), the total nb of data by
numpy.size(s,axis) or simply series.shape[axis].

If you only have 1D data, that's even faster:
nb of valid: series.count()
nb of invalid: series._mask.sum()
nb of data: series.size

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user