[Timeseries] Linux installation error

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Timeseries] Linux installation error

Christiaan Putter
Hi guys and girls,

I just installed the timeseries module on linux after updating to the
latest numpy 1.3 on python2.5.

I followed the normal "python setup.py install"  routine for
installing both timeseries and numpy.  There weren't any errors during
building and installation.

Though when I import the timeseries module I get an error:

Traceback (most recent call last):
  File "/data/workspace/tests/timeseries.py", line 4, in <module>
    import scikits.timeseries as ts
  File "/usr/local/lib/python2.5/site-packages/scikits.timeseries-0.91.0-py2.5-linux-x86_64.egg/scikits/timeseries/__init__.py",
line 13, in <module>
    import const
  File "/usr/local/lib/python2.5/site-packages/scikits.timeseries-0.91.0-py2.5-linux-x86_64.egg/scikits/timeseries/const.py",
line 79, in <module>
    from cseries import freq_constants
ImportError: /usr/local/lib/python2.5/site-packages/scikits.timeseries-0.91.0-py2.5-linux-x86_64.egg/scikits/timeseries/cseries.so:
undefined symbol: _time64


I'm obviously running a 64bit version of linux.  Might this be the
cause of the above error?


Another question:  Is numpy1.3 really necessary seeing as it's still
only in testing?  Most of my users use the enthought python
distribution which still has an older numpy version bundled with it.

Hope someone can help me out.

Have a great day,
Christian
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Pierre GM-2

On Apr 2, 2009, at 5:52 PM, Christiaan Putter wrote:

> Hi guys and girls,
>
> I just installed the timeseries module on linux after updating to the
> latest numpy 1.3 on python2.5.

Ah, an early adopter who failed into our trap...
We detected a pb in the latest sources of scikits.timeseries. I'll  
send you a proper dist off-list, the files should be uploaded on  
sourceforge in the next 24h.


> Another question:  Is numpy1.3 really necessary seeing as it's still
> only in testing?  Most of my users use the enthought python
> distribution which still has an older numpy version bundled with it.

Oh yes. numpy 1.3 improves supports for structured arrays in numpy.ma  
and introduces numpy.lib.io.genfromtxt, two new features  
scikits.timeseries make extensive use of.

That's basically why we didn't make any official release announcement  
yet: we're waiting for numpy 1.3 to be released first. We're just  
getting ready, with relative success as you have unfortunately  
experienced.
In any case, please accept all our sincere apologies for any  
inconvenience.
P.

_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Timmie
Administrator
> In any case, please accept all our sincere apologies for any  
> inconvenience.
No need for excuse.

Indeed, the newly organised documentation with its logo is just impressive.
You may still include the recipes I sent...

I am looking forward into using easy_install timseries after numpy is out.

Thanks for all your efforts!

Timmie.


_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Pierre GM-2

On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:

>> In any case, please accept all our sincere apologies for any
>> inconvenience.
> No need for excuse.
>
> Indeed, the newly organised documentation with its logo is just  
> impressive.

Thank Matt, he did a really great job.

>
> You may still include the recipes I sent...

That's on our todo list, worry not. We intend to add an Examples  
section to the doc.

> I am looking forward into using easy_install timseries after numpy  
> is out.

Hopefully that'll work. We still have some ironing to do here and  
there...


> Thanks for all your efforts!

and thanks a lot for your feeback!
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Chris Barker - NOAA Federal
Pierre GM wrote:
> On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:
>> Indeed, the newly organised documentation with its logo is just  
>> impressive.

where to I find these impressive docs?

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Pierre GM-2

On Apr 2, 2009, at 6:43 PM, Christopher Barker wrote:

> Pierre GM wrote:
>> On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:
>>> Indeed, the newly organised documentation with its logo is just
>>> impressive.
>
> where to I find these impressive docs?

http://pytseries.sourceforge.net/

(or google "scikits timeseries" and feel lucky)
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Chris Barker - NOAA Federal
Pierre GM wrote:
> On Apr 2, 2009, at 6:43 PM, Christopher Barker wrote:
>
>> Pierre GM wrote:
> http://pytseries.sourceforge.net/
>
> (or google "scikits timeseries" and feel lucky)

for some odd reason, I google "time_series", and didn't find it.

Yes the docs look great, and so does the package -- nice work!

-Chris



--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Christiaan Putter
In reply to this post by Pierre GM-2
Hi Pierre,

Thanks for your swift reply, I'll test the code tomorrow.

I'm looking forward to using your timeseries module.  I'm writing a
finance app using Enthought's tool suite and it seems timeseries will
come in quite handy.  Up until now I've been using normal numpy arrays
with pytables for storing actual historical data and postgres (with
SQLAlchemy) for storing some 'higher' level information about stocks
and the results of analysis done on said historical data.  I've found
it's a pretty good combination since hdf5 compression keeps the data
size down to only a few hundred megs and SQLAlchemy is simply awesome
for running queries.

Pytables doesn't play well with threading though even though I'm using
locks in any block of code that so much as sniffs at the hdf5 file
(strangely though it's rather stable on linux but crashes horribly on
windows without even the courtesy of a trace back).  I'll test h5py
some time next week and if it performs better (which they claim on
their site :-) I'll see if I can cook something up similar to what you
did for integrating timeseries and pytables.  I'll send it along to
you once it's usable.

In case that doesn't work I'll probably resort to storing the
historical data in sql as well.  Can someone give me some pointers on
how I would go about that perhaps?  It's about 20 000 - 30 000 stocks
with on average about a decades worth of daily data.  Would I dump all
of that into a single table?  20 000 tables?  hdf5 certainly is much
better suited for something like that...

Hope everyone is having a great day.

Regards,
Christian



2009/4/3 Pierre GM <[hidden email]>:

>
> On Apr 2, 2009, at 6:43 PM, Christopher Barker wrote:
>
>> Pierre GM wrote:
>>> On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:
>>>> Indeed, the newly organised documentation with its logo is just
>>>> impressive.
>>
>> where to I find these impressive docs?
>
> http://pytseries.sourceforge.net/
>
> (or google "scikits timeseries" and feel lucky)
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Christiaan Putter
Ok, just tested the code you sent me and it works just fine  (had to
kill numpy 1.2 though)

Some simple copy / pastable examples would be great to get new users
(that's me) going :-)

Quick question:

Is there any possibility to use python's decimal data type?  I saw you
used it in your sql example on the database side, but I'm guessing
numpy doesn't allow for this?  I sometimes have problems with equality
testing after several divisions / multiplications, so at the moment I
revert to the "is kind of" instead of the "is equal to" approach of
comparison...




2009/4/3 Christiaan Putter <[hidden email]>:

> Hi Pierre,
>
> Thanks for your swift reply, I'll test the code tomorrow.
>
> I'm looking forward to using your timeseries module.  I'm writing a
> finance app using Enthought's tool suite and it seems timeseries will
> come in quite handy.  Up until now I've been using normal numpy arrays
> with pytables for storing actual historical data and postgres (with
> SQLAlchemy) for storing some 'higher' level information about stocks
> and the results of analysis done on said historical data.  I've found
> it's a pretty good combination since hdf5 compression keeps the data
> size down to only a few hundred megs and SQLAlchemy is simply awesome
> for running queries.
>
> Pytables doesn't play well with threading though even though I'm using
> locks in any block of code that so much as sniffs at the hdf5 file
> (strangely though it's rather stable on linux but crashes horribly on
> windows without even the courtesy of a trace back).  I'll test h5py
> some time next week and if it performs better (which they claim on
> their site :-) I'll see if I can cook something up similar to what you
> did for integrating timeseries and pytables.  I'll send it along to
> you once it's usable.
>
> In case that doesn't work I'll probably resort to storing the
> historical data in sql as well.  Can someone give me some pointers on
> how I would go about that perhaps?  It's about 20 000 - 30 000 stocks
> with on average about a decades worth of daily data.  Would I dump all
> of that into a single table?  20 000 tables?  hdf5 certainly is much
> better suited for something like that...
>
> Hope everyone is having a great day.
>
> Regards,
> Christian
>
>
>
> 2009/4/3 Pierre GM <[hidden email]>:
>>
>> On Apr 2, 2009, at 6:43 PM, Christopher Barker wrote:
>>
>>> Pierre GM wrote:
>>>> On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:
>>>>> Indeed, the newly organised documentation with its logo is just
>>>>> impressive.
>>>
>>> where to I find these impressive docs?
>>
>> http://pytseries.sourceforge.net/
>>
>> (or google "scikits timeseries" and feel lucky)
>> _______________________________________________
>> SciPy-user mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Pierre GM-2

On Apr 2, 2009, at 8:21 PM, Christiaan Putter wrote:

> Ok, just tested the code you sent me and it works just fine  (had to
> kill numpy 1.2 though)

Great !!

> Some simple copy / pastable examples would be great to get new users
> (that's me) going :-)

FYI, you can find some examples of application at http://hydroclimpy.sourceforge.net/ 
. That's a package of extensions for timeseries focused on  
environmental series. That should get you started. Once again, we'll  
soon put an Example section online.
For financial applications, check with Matt Knox, the co-author of the  
package. I'm the hydrologist of the duo.

> Quick question:
>
> Is there any possibility to use python's decimal data type?  I saw you
> used it in your sql example on the database side, but I'm guessing
> numpy doesn't allow for this?

I don't think that numpy can interact w/ Decimal, but I never actually  
tried myself.


> I sometimes have problems with equality
> testing after several divisions / multiplications, so at the moment I
> revert to the "is kind of" instead of the "is equal to" approach of
> comparison...

You can use assert_almost_equal from the testing modules.

>  I'll test h5py
> some time next week and if it performs better (which they claim on
> their site :-) I'll see if I can cook something up similar to what you
> did for integrating timeseries and pytables.  I'll send it along to
> you once it's usable.

Please do, it's always useful indeed.




>
>
>
>
> 2009/4/3 Christiaan Putter <[hidden email]>:
>> Hi Pierre,
>>
>> Thanks for your swift reply, I'll test the code tomorrow.
>>
>> I'm looking forward to using your timeseries module.  I'm writing a
>> finance app using Enthought's tool suite and it seems timeseries will
>> come in quite handy.  Up until now I've been using normal numpy  
>> arrays
>> with pytables for storing actual historical data and postgres (with
>> SQLAlchemy) for storing some 'higher' level information about stocks
>> and the results of analysis done on said historical data.  I've found
>> it's a pretty good combination since hdf5 compression keeps the data
>> size down to only a few hundred megs and SQLAlchemy is simply awesome
>> for running queries.
>>
>> Pytables doesn't play well with threading though even though I'm  
>> using
>> locks in any block of code that so much as sniffs at the hdf5 file
>> (strangely though it's rather stable on linux but crashes horribly on
>> windows without even the courtesy of a trace back).  I'll test h5py
>> some time next week and if it performs better (which they claim on
>> their site :-) I'll see if I can cook something up similar to what  
>> you
>> did for integrating timeseries and pytables.  I'll send it along to
>> you once it's usable.
>>
>> In case that doesn't work I'll probably resort to storing the
>> historical data in sql as well.  Can someone give me some pointers on
>> how I would go about that perhaps?  It's about 20 000 - 30 000 stocks
>> with on average about a decades worth of daily data.  Would I dump  
>> all
>> of that into a single table?  20 000 tables?  hdf5 certainly is much
>> better suited for something like that...
>>
>> Hope everyone is having a great day.
>>
>> Regards,
>> Christian
>>
>>
>>
>> 2009/4/3 Pierre GM <[hidden email]>:
>>>
>>> On Apr 2, 2009, at 6:43 PM, Christopher Barker wrote:
>>>
>>>> Pierre GM wrote:
>>>>> On Apr 2, 2009, at 6:19 PM, Tim Michelsen wrote:
>>>>>> Indeed, the newly organised documentation with its logo is just
>>>>>> impressive.
>>>>
>>>> where to I find these impressive docs?
>>>
>>> http://pytseries.sourceforge.net/
>>>
>>> (or google "scikits timeseries" and feel lucky)
>>> _______________________________________________
>>> SciPy-user mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

josef.pktd
In reply to this post by Christiaan Putter
On Thu, Apr 2, 2009 at 8:21 PM, Christiaan Putter
<[hidden email]> wrote:

> Ok, just tested the code you sent me and it works just fine  (had to
> kill numpy 1.2 though)
>
> Some simple copy / pastable examples would be great to get new users
> (that's me) going :-)
>
> Quick question:
>
> Is there any possibility to use python's decimal data type?  I saw you
> used it in your sql example on the database side, but I'm guessing
> numpy doesn't allow for this?  I sometimes have problems with equality
> testing after several divisions / multiplications, so at the moment I
> revert to the "is kind of" instead of the "is equal to" approach of
> comparison...
>

for floating point comparison it is better to use something like

numpy.allclose(a, b, rtol=1.0000000000000001e-05, atol=1e-08)
numpy.ma.allclose(a, b, masked_equal=True,
rtol=1.0000000000000001e-05, atol=1e-08, fill_value=None)

Returns True if two arrays are element-wise equal within a tolerance.
absolute(a - b) <= (atol + rtol * absolute(b))

instead of equal,

Josef
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Matt Knox-4
In reply to this post by Christiaan Putter

>> In case that doesn't work I'll probably resort to storing the
>> historical data in sql as well.  Can someone give me some pointers on
>> how I would go about that perhaps?  It's about 20 000 - 30 000 stocks
>> with on average about a decades worth of daily data.  Would I dump all
>> of that into a single table?  20 000 tables?  hdf5 certainly is much
>> better suited for something like that...

in terms of raw throughput on reading/writing simple stock price history, yes
HDF5 will be better, but if you need any kind of concurrent access or plan
to throw some stuff up on a web-site, you will quickly find that the robustness
of a modern relational database system will outweigh the performance benefits
of a pure HDF5 solution. You could look at a hybrid solution, but the complexity
often would not be worth it. I think HDF5 and similar types of storage are
great for research projects and ad-hoc analysis, but if you are talking about
a large scale production system it is going to be hard to beat a modern
relational database. And you may be surprised by the performance you get from
a modern relational db running on modern hardware. These systems are designed to
handle LOTS of data.

Anyway, your question really comes down to database design, and I would highly
recommend you do some introductory reading on the basics of table design and
such. One thing with relational dbs is that it is generally very hard to change
the schema (ie. table structures) once your application gets to a certain size
so you really need to plan an overall architecture ahead of time. Things like
HDF5 allow for a little bit more of a care free approach.

Also, having managed a large equity database before, I can say that it is not
something I can adequately describe how to do well in a brief email on a mailing
list.

But I will say this... a typical setup would have a "master security" table
(which may actually take several tables to properly describe the securities)
which would have an integer id for every security and map it to various
identifiers like cusip's, isin's, ticker's, etc. You will also need to account
for changing tickers.

To store your actual data (lets just say it is price data for now), you could
have a table with 3 fields:

[security_id] [int],
[date] [datetime],
[price] [decimal](12,5)

The natural primary key here would be ([security_id], [date])
I won't get into the topic of "surrogate keys", but you may want to google that
too. You would probably have a foreign key for the security_id field referencing
your master security table -- although maybe not depending on performance
considerations.

Then there are things like indexes to consider to optimize the performance for
your usage patterns.

And you definitely DO NOT want 20,000 tables to store 20,000 stocks.

You'll also need to think about how you handle corporate actions in your db like
stock splits, mergers, etc. If you mess up on the design of how to do this, you
will be in for a world of pain as far as maintenance of the database.

Now as for reading your data into Python, you will find numpy arrays using
compound data types work quite nicely. I often query out a big chunk of data,
store it in a numpy array with a compound data type for caching purposes, then
filter that array to get the specific chunks of data I need (eg. read 100 stocks
worth of data at a time into a single array, then filter that array after
rather than hitting the db one time for each stock). This is something I'll
probably add to the database examples in the documentation at some point.

>> Is there any possibility to use python's decimal data type?  I saw you
>> used it in your sql example on the database side, but I'm guessing
>> numpy doesn't allow for this?  I sometimes have problems with equality
>> testing after several divisions / multiplications, so at the moment I
>> revert to the "is kind of" instead of the "is equal to" approach of
>> comparison...

Numpy does not support a decimal type. I agree it would be a really nice
addition, but I can generally live without it and probably won't have the
motivation to contribute that to the numpy community any time soon. You can
use an object array to store decimal values, but that is not recommended. You
should always just cast the decimal values to floats prior to pulling them into
Python unless you are talking about some accounting system or something where
that level of accuracy matters.

And I would agree with Josef's comments with regards to the approach for
equality checks.



_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: [Timeseries] Linux installation error

Francesc Alted-2
In reply to this post by Christiaan Putter
A Friday 03 April 2009, Christiaan Putter escrigué:
>
> Pytables doesn't play well with threading though even though I'm
> using locks in any block of code that so much as sniffs at the hdf5
> file (strangely though it's rather stable on linux but crashes
> horribly on windows without even the courtesy of a trace back).

Please could you be more explicit on the sort of problems that you are
experiencing?  I'd be glad to discuss them (in the PyTables list,
preferably) and see if they can be addressed in some way or another.

Cheers,

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'.  In fact, the fightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
   "On the cruelty of really teaching computer science"
_______________________________________________
SciPy-user mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user