OT: Literature management, pdf downloader

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

OT: Literature management, pdf downloader

John [H2O]
Sorry for an OT post, but I thought this might be a community that
would have interest in the attached script.

For those of you actively conducting research, I imagine you have a
variety of 'tools' for managing PDFs. For anyone using a Mac, I guess
it's 'Papers', which seems to be quite brilliant software. On Linux,
I've gone with Mendeley, which I am very pleased with. For my actual
searching, I rely on the webofscience or ISI searches.

Here is my process:

1) ISI search for articles, add to 'marked list'
2) export marked list to bibtex
3) download pdf files to which I have access
4) dump them into a 'staging' folder for Mendeley
5) let Mendeley import them into my library (making copies)

This has worked very well, but recently I became frustrated with the
amount of time I spent downloading articles. I decided to write a
script to do it for me. Attached you'll find a script which uses the
DOI numbers (if present) and essentially accomplishes steps 3 & 4
above. I would like to add this eventually as functionality to either
Mendeley or kbibtex or pybibliographer. The functionality I see is
that you could select some references in any of the aforementioned
software, and then click a 'download PDFs' button.

Does this exist at all?!? If so, please let me know.

Okay, so assuming it does not, in the attached script, you'll see that
what it does is to parse a bibtex file to extract the DOI numbers. If
they don't exist, the article is skipped, SOL. If the DOI number is
available it then accesses the dx.doi.org website to figure out where
to get the article. Then after some 'screen scraping' the link to the
pdf is used to download the PDF to a 'LIBRARY' directory. Of course
the major assumption here is that you have access to the articles
through your network.

There are some outstanding issues, and in general this is an email
reaching out to more experienced programmers for comments on the
following:

1) I need to do Error handling better, (i.e. at least I should have a timeout)
2) I would like to be able to include authentication handling (maybe
in a config file, you could provide access credentials for various
journals)
3) Getting rid of the BeautifulSoup and pybtex dependency (or learn
how to package so that when someone uses easy install, those
dependencies will also be installed)
4) I need to be able to handle cookies (this is a problem so far only
for the get_acs method).
5) Are my various journal methods the best way to do this??

If folks object to my posting this here, please suggest a place you
might think would be more appropriate.

If I get positive feedback, I'll post this to a public site where
version control can be done so folks can do their own legwork to add
'screen scraper' methods for other journals.

All the best,
john

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

get_publications.py (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OT: Literature management, pdf downloader

Vincent Favre-Nicolin-2
> This has worked very well, but recently I became frustrated with the
> amount of time I spent downloading articles. I decided to write a
> script to do it for me. Attached you'll find a script which uses the
> DOI numbers (if present) and essentially accomplishes steps 3 & 4
> above. I would like to add this eventually as functionality to either
> Mendeley or kbibtex or pybibliographer. The functionality I see is
> that you could select some references in any of the aforementioned
> software, and then click a 'download PDFs' button.
>
> Does this exist at all?!? If so, please let me know.

   This is indeed really OT - but have you tried Zotero ? It's a really great
tool for managing bibliography, especially (but not only) for research
articles.
        http://www.zotero.org/

--
Vincent Favre-Nicolin

CEA / INAC                              http://inac.cea.fr
Université Joseph Fourier        http://www.ujf-grenoble.fr

http://vincefn.net
ObjCryst & Fox : http://objcryst.sourceforge.net
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: OT: Literature management, pdf downloader

Gael Varoquaux
In reply to this post by John [H2O]
On Mon, Feb 28, 2011 at 01:25:45PM +0100, John wrote:
> searching, I rely on the webofscience or ISI searches.

> 1) ISI search for articles, add to 'marked list'

Continuing with off topic, but in my experience google scholar works much
better than ISI, it is accessible freely, can link to preprints, and if
you adjust your preferences, will give you the bibtex files.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: OT: Literature management, pdf downloader

John [H2O]
I do use scholar a lot, and I have played with Zotero (which actually
integrates quite well with Mendeley), but do either of these actually
have a way to 'batch download' the PDFs, that was more the point of
the script.

I wonder if screen scraping scholar or working with scholar somehow to
find the PDFs may be a better approach rather than relying on DOIs??

Thanks,
john



On Mon, Feb 28, 2011 at 1:51 PM, Gael Varoquaux
<[hidden email]> wrote:

> On Mon, Feb 28, 2011 at 01:25:45PM +0100, John wrote:
>> searching, I rely on the webofscience or ISI searches.
>
>> 1) ISI search for articles, add to 'marked list'
>
> Continuing with off topic, but in my experience google scholar works much
> better than ISI, it is accessible freely, can link to preprints, and if
> you adjust your preferences, will give you the bibtex files.
>
> G
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



--
Configuration
``````````````````````````
Plone 2.5.3-final,
CMF-1.6.4,
Zope (Zope 2.9.7-final, python 2.4.4, linux2),
Python 2.6
PIL 1.1.6
Mailman 2.1.9
Postfix 2.4.5
Procmail v3.22 2001/09/10
Basemap: 1.0
Matplotlib: 1.0.0
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: OT: Literature management, pdf downloader

Paul Anton Letnes
In reply to this post by Gael Varoquaux

On 28. feb. 2011, at 13.51, Gael Varoquaux wrote:

> On Mon, Feb 28, 2011 at 01:25:45PM +0100, John wrote:
>> searching, I rely on the webofscience or ISI searches.
>
>> 1) ISI search for articles, add to 'marked list'
>
> Continuing with off topic, but in my experience google scholar works much
> better than ISI, it is accessible freely, can link to preprints, and if
> you adjust your preferences, will give you the bibtex files.
>
Do you know if googlecl works with scholar? If not, maybe it could be easily extended?

Also OT, BibDesk is great software for managing papers on the mac. Since it works directly with pure text .bib files, it should be easier to talk to than these other programs, if I understand things correctly. You can also save things like DOI, making PDF download a luxury, but no necessity until you start making notes or going away from a network connections for longer periods. Just click on the doi, and you are taken to the webpage of the article.

Good luck though, your project sounds interesting!

Paul

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: OT: Literature management, pdf downloader

Gökhan SEVER-2
In reply to this post by Gael Varoquaux


On Mon, Feb 28, 2011 at 5:51 AM, Gael Varoquaux <[hidden email]> wrote:
On Mon, Feb 28, 2011 at 01:25:45PM +0100, John wrote:
> searching, I rely on the webofscience or ISI searches.

> 1) ISI search for articles, add to 'marked list'

Continuing with off topic, but in my experience google scholar works much
better than ISI, it is accessible freely, can link to preprints, and if
you adjust your preferences, will give you the bibtex files.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

In my experience scopus.com gives more refined results compared to both gscholar and ISI provided you have access the site. You can do author based queries (e.g. list an authors published papers) which I find very useful to access someone's publication history. For a few atmospheric sciences related specific key term searches, scopus gives broader publication coverage comparing to the gscholar and ISI. They have provided an API for programmatic queries but not sure about their publication access procedure. I register into their search update service (where I get updates when their database is updated for my saved searches --gscholar and ISI have similar services but again scopus provides better results and allows more refinement in searches) and manually download the articles that I am interested in. 

--
Gökhan

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user