[SciPy-User] peer review of scientific software

classic Classic list List threaded Threaded
70 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] peer review of scientific software

Alan G Isaac-2

http://www.sciencemag.org/content/340/6134/814.summary

Alan Isaac
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Calvin Morrison



On 27 May 2013 13:44, Alan G Isaac <[hidden email]> wrote:

http://www.sciencemag.org/content/340/6134/814.summary

Maybe I can use this as a ranting point.

As a background I am a programmer, but I have been hired by various professors and other academic people to work on projects. My biggest problem with the "scientific computing" community is having to put up with this crap!

I am so sick of reading a paper, that will exactly fulfill my needs, only to find out the software has disappeared, only works on HP-UX on one computer in the world, or has absolutely zero documentation. 

I've started pushing my lab to follow standards, use version control, document our changes, and publish our code. So far it is going really well, but I can only do so much.

If only all of all students had a few basic "proper programming practices" courses, everything would go a lot smoother. Teaching programming is fine, but why don't we teach our scientists the best way to collaborate in the 21st century?

Below is another related paper that is a good starting point for converting users. Enough emailing tarballs back and forth! Enough undocumented code! Enough is Enough!


Pissed-off Scientific Programmer,
Calvin Morrison



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Martin81

Hi,

Nice article. The frustration for students without a formal programming background such as a bachelor in computer science is as big as that for students and Profs that do have such a background, I think. The solution is of course proper education. Code is, like math, a language on its own. Math we often learn from teachers – not necessarily in the Math department - that went through it themselves years ago. For coding this is different. Programming techniques change so quickly and new languages keep popping up, making transfer of up-to-date knowledge in Academic curricula more challenging.

Bad programming practices in Academia (and elsewhere) certainly aren't the frustration of ‘good’ and ‘up-to-date’ programmers alone, but they are also the frustration of those who want to learn by example. Much of this learning process evolves through online tutorials, blogs, cookbooks etc, but there are so many of them. Python especially is a language that is used by many, and it is also a recommended beginner’s language, so one can expect a lot of bad programming practices.

Perhaps that some sort of ranking of open source code helps making developers realize what is good practice and what is bad practice. Or even a prestigious display of some excellent coding projects, at a variety of levels of complexity or project size. Could Scipy perhaps take the lead by making knowledge as to what are good programming skills more accessible through some sort of open ranking/voting platform?

Martin



2013/5/28 Calvin Morrison <[hidden email]>



On 27 May 2013 13:44, Alan G Isaac <[hidden email]> wrote:

http://www.sciencemag.org/content/340/6134/814.summary

Maybe I can use this as a ranting point.

As a background I am a programmer, but I have been hired by various professors and other academic people to work on projects. My biggest problem with the "scientific computing" community is having to put up with this crap!

I am so sick of reading a paper, that will exactly fulfill my needs, only to find out the software has disappeared, only works on HP-UX on one computer in the world, or has absolutely zero documentation. 

I've started pushing my lab to follow standards, use version control, document our changes, and publish our code. So far it is going really well, but I can only do so much.

If only all of all students had a few basic "proper programming practices" courses, everything would go a lot smoother. Teaching programming is fine, but why don't we teach our scientists the best way to collaborate in the 21st century?

Below is another related paper that is a good starting point for converting users. Enough emailing tarballs back and forth! Enough undocumented code! Enough is Enough!


Pissed-off Scientific Programmer,
Calvin Morrison



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matthew Brett
Hi,

On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen
<[hidden email]> wrote:
> Hi,
>
> Nice article. The frustration for students without a formal programming
> background such as a bachelor in computer science is as big as that for
> students and Profs that do have such a background, I think.

I found the article frustrating - it didn't seem to have much to add
to a general set of feelings (that most of us share) that writing code
properly is a good idea.

The question that always comes up is - why?   Most scientists trained
in the ad-hoc get-it-to-work model have a rather deep belief that this
model is more or less OK, and that doing all that version-control,
testing stuff is for programming types with lots of time on their
hands.  If we want to persuade those guys and gals, we have to come up
with something pretty compelling, and I don't think we have that yet.
I would love to see some really good data to show that we'd proceed
faster as scientists with more organized coding practice,

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Calvin Morrison



On 28 May 2013 16:00, Matthew Brett <[hidden email]> wrote:
Hi,

On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen
<[hidden email]> wrote:
> Hi,
>
> Nice article. The frustration for students without a formal programming
> background such as a bachelor in computer science is as big as that for
> students and Profs that do have such a background, I think.

I found the article frustrating - it didn't seem to have much to add
to a general set of feelings (that most of us share) that writing code
properly is a good idea.

The question that always comes up is - why?   Most scientists trained
in the ad-hoc get-it-to-work model have a rather deep belief that this
model is more or less OK, and that doing all that version-control,
testing stuff is for programming types with lots of time on their
hands.

Yes and wearing lab vests, writing down procedures and documenting methods is all just for people with so much time... 

Version control, unit testing, proper practices in general, actually are time savers. I use version control on my own projects, because it helps me organize my code and allows me to stay organized. Coding well helps me read through my code easier and collaborate with others. The issue is not with the tools, it is with people refusing to learn how to use them.

The ability to reproduce results is a very important aspect of science is it not? How can I know if your claims are true if you have hidden software that has never seen the light of day? How can you benefit the community if nobody can use your software?

 If we want to persuade those guys and gals, we have to come up
with something pretty compelling, and I don't think we have that yet.
I would love to see some really good data to show that we'd proceed
faster as scientists with more organized coding practice,

Proceed faster as scientists individually? Maybe not. But as an aggregate, the community most certainly benefit from not having to reimplement tools, by developing tools for other people to use, by making it easier to collaborate and reasons that I can't even think of!

What is the point of publishing works if you aren't publishing the tools? Who are you helping? The community, the populace as a whole, or your silly CV?

Calvin

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Nathaniel Smith
In reply to this post by Matthew Brett
On Tue, May 28, 2013 at 9:00 PM, Matthew Brett <[hidden email]> wrote:
> The question that always comes up is - why?   Most scientists trained
> in the ad-hoc get-it-to-work model have a rather deep belief that this
> model is more or less OK, and that doing all that version-control,
> testing stuff is for programming types with lots of time on their
> hands.  If we want to persuade those guys and gals, we have to come up
> with something pretty compelling, and I don't think we have that yet.
> I would love to see some really good data to show that we'd proceed
> faster as scientists with more organized coding practice,

I always make newbies read this:
  http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html
(Short version: basically someone's career being destroyed by a sign
error in code written by some random person in the next lab.)

Then I show them the magic of 'if __name__ == "__main__": import nose;
nose.runmodule()'.

Maybe it sinks in sometimes...

I agree that while it's all very noble to talk about the benefit to
science as a whole and the long term benefits of eating your
vegetables and blah blah blah, the actually compelling reason to use
VCS and test everything is that it always turns out to pay back the
investment in, like, tens of minutes.

Telling people about the benefits of regular exercise never works either.

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matthew Brett
In reply to this post by Calvin Morrison
Hi,

On Tue, May 28, 2013 at 1:11 PM, Calvin Morrison <[hidden email]> wrote:

>
>
>
> On 28 May 2013 16:00, Matthew Brett <[hidden email]> wrote:
>>
>> Hi,
>>
>> On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen
>> <[hidden email]> wrote:
>> > Hi,
>> >
>> > Nice article. The frustration for students without a formal programming
>> > background such as a bachelor in computer science is as big as that for
>> > students and Profs that do have such a background, I think.
>>
>> I found the article frustrating - it didn't seem to have much to add
>> to a general set of feelings (that most of us share) that writing code
>> properly is a good idea.
>>
>> The question that always comes up is - why?   Most scientists trained
>> in the ad-hoc get-it-to-work model have a rather deep belief that this
>> model is more or less OK, and that doing all that version-control,
>> testing stuff is for programming types with lots of time on their
>> hands.
>
>
> Yes and wearing lab vests, writing down procedures and documenting methods
> is all just for people with so much time...
>
> Version control, unit testing, proper practices in general, actually are
> time savers. I use version control on my own projects, because it helps me
> organize my code and allows me to stay organized. Coding well helps me read
> through my code easier and collaborate with others. The issue is not with
> the tools, it is with people refusing to learn how to use them.
>
> The ability to reproduce results is a very important aspect of science is it
> not? How can I know if your claims are true if you have hidden software that
> has never seen the light of day? How can you benefit the community if nobody
> can use your software?
>
>>  If we want to persuade those guys and gals, we have to come up
>> with something pretty compelling, and I don't think we have that yet.
>> I would love to see some really good data to show that we'd proceed
>> faster as scientists with more organized coding practice,
>
>
> Proceed faster as scientists individually? Maybe not. But as an aggregate,
> the community most certainly benefit from not having to reimplement tools,
> by developing tools for other people to use, by making it easier to
> collaborate and reasons that I can't even think of!
>
> What is the point of publishing works if you aren't publishing the tools?
> Who are you helping? The community, the populace as a whole, or your silly
> CV?

I have personally been doing these good things "since my youth", and
trying to teach other people to do the same.

I don't often have much success though, hence my email.

The response usually goes something like "I can't afford to waste
time, I've got a deadline / I've got to get tenure" etc.

If our response is a general 'oh but it's much better to do it that
way' - I can assure you, most of the time, that doesn't cut it.

As Nathaniel says - for those who are interested - just showing them
how to do this stuff can often be enough.   "Oh yes, I get it,
awesome".

For those who sense this as a threat to waste their time in tedious
detail - that isn't going to work.

We really have to persuade these people that - for a short investment
of time - they will reap major benefits - for themselves and / or for
their fellow scientists.   I don't know of much data to help with this
latter thing.  I can imagine data, but I don't know where it is, or if
it exists...

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Martin81
In reply to this post by Nathaniel Smith
I tend to think that we get less and less time on our sleeves as science seems to become more competitive. But that being said, I don't think we should respond by teaching students to forget about version control and unit testing. If it isn't to support your own debugging efforts, it also provides a handle for anyone attempting to use someone else's code.


2013/5/28 Nathaniel Smith <[hidden email]>
On Tue, May 28, 2013 at 9:00 PM, Matthew Brett <[hidden email]> wrote:
> The question that always comes up is - why?   Most scientists trained
> in the ad-hoc get-it-to-work model have a rather deep belief that this
> model is more or less OK, and that doing all that version-control,
> testing stuff is for programming types with lots of time on their
> hands.  If we want to persuade those guys and gals, we have to come up
> with something pretty compelling, and I don't think we have that yet.
> I would love to see some really good data to show that we'd proceed
> faster as scientists with more organized coding practice,

I always make newbies read this:
  http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html
(Short version: basically someone's career being destroyed by a sign
error in code written by some random person in the next lab.)

Then I show them the magic of 'if __name__ == "__main__": import nose;
nose.runmodule()'.

Maybe it sinks in sometimes...

I agree that while it's all very noble to talk about the benefit to
science as a whole and the long term benefits of eating your
vegetables and blah blah blah, the actually compelling reason to use
VCS and test everything is that it always turns out to pay back the
investment in, like, tens of minutes.

Telling people about the benefits of regular exercise never works either.

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Calvin Morrison
In reply to this post by Matthew Brett



On 28 May 2013 16:23, Matthew Brett <[hidden email]> wrote:
Hi,

On Tue, May 28, 2013 at 1:11 PM, Calvin Morrison <[hidden email]> wrote:
>
>
>
> On 28 May 2013 16:00, Matthew Brett <[hidden email]> wrote:
>>
>> Hi,
>>
>> On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen
>> <[hidden email]> wrote:
>> > Hi,
>> >
>> > Nice article. The frustration for students without a formal programming
>> > background such as a bachelor in computer science is as big as that for
>> > students and Profs that do have such a background, I think.
>>
>> I found the article frustrating - it didn't seem to have much to add
>> to a general set of feelings (that most of us share) that writing code
>> properly is a good idea.
>>
>> The question that always comes up is - why?   Most scientists trained
>> in the ad-hoc get-it-to-work model have a rather deep belief that this
>> model is more or less OK, and that doing all that version-control,
>> testing stuff is for programming types with lots of time on their
>> hands.
>
>
> Yes and wearing lab vests, writing down procedures and documenting methods
> is all just for people with so much time...
>
> Version control, unit testing, proper practices in general, actually are
> time savers. I use version control on my own projects, because it helps me
> organize my code and allows me to stay organized. Coding well helps me read
> through my code easier and collaborate with others. The issue is not with
> the tools, it is with people refusing to learn how to use them.
>
> The ability to reproduce results is a very important aspect of science is it
> not? How can I know if your claims are true if you have hidden software that
> has never seen the light of day? How can you benefit the community if nobody
> can use your software?
>
>>  If we want to persuade those guys and gals, we have to come up
>> with something pretty compelling, and I don't think we have that yet.
>> I would love to see some really good data to show that we'd proceed
>> faster as scientists with more organized coding practice,
>
>
> Proceed faster as scientists individually? Maybe not. But as an aggregate,
> the community most certainly benefit from not having to reimplement tools,
> by developing tools for other people to use, by making it easier to
> collaborate and reasons that I can't even think of!
>
> What is the point of publishing works if you aren't publishing the tools?
> Who are you helping? The community, the populace as a whole, or your silly
> CV?

I have personally been doing these good things "since my youth", and
trying to teach other people to do the same.

I don't often have much success though, hence my email.

The response usually goes something like "I can't afford to waste
time, I've got a deadline / I've got to get tenure" etc.

If our response is a general 'oh but it's much better to do it that
way' - I can assure you, most of the time, that doesn't cut it.

As Nathaniel says - for those who are interested - just showing them
how to do this stuff can often be enough.   "Oh yes, I get it,
awesome".

For those who sense this as a threat to waste their time in tedious
detail - that isn't going to work.

We really have to persuade these people that - for a short investment
of time - they will reap major benefits - for themselves and / or for
their fellow scientists.   I don't know of much data to help with this
latter thing.  I can imagine data, but I don't know where it is, or if
it exists...


We need to persuade people on a large scale. I think it took some time for regular science to establish baseline standards, and as computing is still relatively new, people haven't figured out the baseline standards. I think this is a very important issue, and some real action should be done to improve it.

Two ways I can think of is requiring peer-reviewed software, like we review our journals, and requiring the publishing of software with the journal submission. Not "find it on our page", not "materials upon request" but some way that we can guarantee it won't drop off the face of the earth

Just an idea,

Calvin



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Chris Weisiger-2
In reply to this post by Martin81
I joined a lab after it had already developed a substantial codebase, which was littered with comments like:

# 20060824 recalibrated
# old values [0.08, .11, .09]

I chucked the entire codebase into source control, and then went through and started deleting these comments...and had to spend a lot of time convincing my coworkers that nothing was being lost, and that I could retrieve any older version if they just gave me a date!

I honestly think that a lot of this stuff is pretty self-evidently valuable, *if you have learned about it to begin with*. My coworkers aren't software developers; they're scientists in biology, physics, optics, etc. Programming is outside of their skillset and thus difficult (just as I would have a lot of trouble designing an experiment), not because they're bad at programming but just because they haven't had the relevant training.

In short, if you want scientists to be better programmers, you have to train them to be better programmers. I know my alma mater is placing more of an emphasis on programming these days, since everyone who does any work in the sciences needs at least some skill in that discipline.

This doesn't help us with all of the scientists who are currently out there producing bad code and not using standard tools, of course.

-Chris


On Tue, May 28, 2013 at 1:23 PM, Martin van Leeuwen <[hidden email]> wrote:
I tend to think that we get less and less time on our sleeves as science seems to become more competitive. But that being said, I don't think we should respond by teaching students to forget about version control and unit testing. If it isn't to support your own debugging efforts, it also provides a handle for anyone attempting to use someone else's code.


2013/5/28 Nathaniel Smith <[hidden email]>
On Tue, May 28, 2013 at 9:00 PM, Matthew Brett <[hidden email]> wrote:
> The question that always comes up is - why?   Most scientists trained
> in the ad-hoc get-it-to-work model have a rather deep belief that this
> model is more or less OK, and that doing all that version-control,
> testing stuff is for programming types with lots of time on their
> hands.  If we want to persuade those guys and gals, we have to come up
> with something pretty compelling, and I don't think we have that yet.
> I would love to see some really good data to show that we'd proceed
> faster as scientists with more organized coding practice,

I always make newbies read this:
  http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html
(Short version: basically someone's career being destroyed by a sign
error in code written by some random person in the next lab.)

Then I show them the magic of 'if __name__ == "__main__": import nose;
nose.runmodule()'.

Maybe it sinks in sometimes...

I agree that while it's all very noble to talk about the benefit to
science as a whole and the long term benefits of eating your
vegetables and blah blah blah, the actually compelling reason to use
VCS and test everything is that it always turns out to pay back the
investment in, like, tens of minutes.

Telling people about the benefits of regular exercise never works either.

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matt Newville
In reply to this post by Alan G Isaac-2
Hi,

As others have said, I find the low average programming skill level
among scientists frustrating,  but I also found this article quite
frustrating.

>From my perspective, the authors main complaint seems to be that there
is not enough independent checking of specialized scientific software
written by scientists.  They seem particularly unhappy about the
tendency to use existing packages written by other scientists based on
"trust", "reputation", "previous citations" and without independent
checking.  They also say:

      A "well-respected" end-user developer will almost certainly have
earned that respect
      through scientific breakthroughs, perhaps not for their software
engineering skills
      (although agreement on what constitutes "appropriate" scientific
software engineering
      standards is still under debate).

On this point in particular, and indeed in this whole line of
argument, I think the authors are misguided, perhaps even to the point
of fatality damaging their whole argument.   I believe much more
common case is for the "well-respected" end-user developer to be known
for the programs written and supported, and less so for the scientific
breakthroughs (unless you count new programs as new instrumentation,
and so, well, breakthroughs, but it's pretty clear that the authors
are making a distinction).    It's too often the case that spending
any significant time on such programs is career suicide, as it takes
time and attention away from such breakthroughs.   It's perfectly
believable that the programming skills of such a scientific developer
may be incomplete, but I think it's fair to say that most supported
and well-used programs are likely the effort of people with
above-average programming skills and the interest and intent to
support such programs.   Indeed, I would argue that instead of being
unhappy about the reliance on trusted programs and developers, the
authors would better serve the scientific community by arguing that
the authors of such programs should be better supported, and given
access to tools and resources (ie, fund them) to improve their work
rather than treat them as untrustworthy programmers.

I should admit to being one such author of a "well-respected" and
"trust" package for a very small scientific discipline, and with the
proverbial "many citations etc" because of this.  So I would admit to
being the just sort of person the authors are unhappy about.  I
suspect many people on this mailing list are in the same category.   I
would like to think the trust and respect for certain packages have
been earned, and that people use such packages because they are "known
to work", both in the sense of actually having been tested on
idealized cases, and in producing verifiable results in real cases
(where "testing" would not always be possible).   Indeed, the small,
decentralized group of scientific programmers that I work with (mostly
trained as physicists, and learning to program in Fortran -- some of
us still use mostly Fortran, in fact) do test and verify such codes,
precisely because we know other people use them.   Of course errors
occur, and of course testing is important.   Modern techniques like
distributed version control and unit testing are very good tools to
use.   I agree they should be used more thoroughly, and that one
should always be willing to question the results of a computer
program.

Then again, when was the last time I tested the correctness of results
from my handheld HP calculator?    Hmm, a very, very long time ago.
That's software.  I tend to believe the messages I read in my inbox
are actually the message sent, and hardly ever do a checksum on it.
But that's software.  Indeed, all science is a social enterprise and
so "trust", "reputation", and reliance on the literature (aka "past
experience") are not merely unfortunate outcomes of laziness, but an
important part of the process.

I am certainly am happy to support the notion that "more scientists
should be able to program better", so  I am not going to say the
entire article is wrong, and I don't disagree with their main
conclusions.  But I think they have a fatal flaw in their assumptions
and arguments.

--Matt Newville
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

John Hassler

On 5/28/2013 4:58 PM, Matt Newville wrote:

> Hi,
>
> As others have said, I find the low average programming skill level
> among scientists frustrating,  but I also found this article quite
> frustrating.
>
> >From my perspective, the authors main complaint seems to be that there
> is not enough independent checking of specialized scientific software
> written by scientists.  They seem particularly unhappy about the
> tendency to use existing packages written by other scientists based on
> "trust", "reputation", "previous citations" and without independent
> checking.  They also say:
>
>        A "well-respected" end-user developer will almost certainly have
> earned that respect
>        through scientific breakthroughs, perhaps not for their software
> engineering skills
>        (although agreement on what constitutes "appropriate" scientific
> software engineering
>        standards is still under debate).
>
> On this point in particular, and indeed in this whole line of
> argument, I think the authors are misguided, perhaps even to the point
> of fatality damaging their whole argument.   I believe much more
> common case is for the "well-respected" end-user developer to be known
> for the programs written and supported, and less so for the scientific
> breakthroughs (unless you count new programs as new instrumentation,
> and so, well, breakthroughs, but it's pretty clear that the authors
> are making a distinction).    It's too often the case that spending
> any significant time on such programs is career suicide, as it takes
> time and attention away from such breakthroughs.   It's perfectly
> believable that the programming skills of such a scientific developer
> may be incomplete, but I think it's fair to say that most supported
> and well-used programs are likely the effort of people with
> above-average programming skills and the interest and intent to
> support such programs.   Indeed, I would argue that instead of being
> unhappy about the reliance on trusted programs and developers, the
> authors would better serve the scientific community by arguing that
> the authors of such programs should be better supported, and given
> access to tools and resources (ie, fund them) to improve their work
> rather than treat them as untrustworthy programmers.
>
> I should admit to being one such author of a "well-respected" and
> "trust" package for a very small scientific discipline, and with the
> proverbial "many citations etc" because of this.  So I would admit to
> being the just sort of person the authors are unhappy about.  I
> suspect many people on this mailing list are in the same category.   I
> would like to think the trust and respect for certain packages have
> been earned, and that people use such packages because they are "known
> to work", both in the sense of actually having been tested on
> idealized cases, and in producing verifiable results in real cases
> (where "testing" would not always be possible).   Indeed, the small,
> decentralized group of scientific programmers that I work with (mostly
> trained as physicists, and learning to program in Fortran -- some of
> us still use mostly Fortran, in fact) do test and verify such codes,
> precisely because we know other people use them.   Of course errors
> occur, and of course testing is important.   Modern techniques like
> distributed version control and unit testing are very good tools to
> use.   I agree they should be used more thoroughly, and that one
> should always be willing to question the results of a computer
> program.
>
> Then again, when was the last time I tested the correctness of results
> from my handheld HP calculator?    Hmm, a very, very long time ago.
> That's software.  I tend to believe the messages I read in my inbox
> are actually the message sent, and hardly ever do a checksum on it.
> But that's software.  Indeed, all science is a social enterprise and
> so "trust", "reputation", and reliance on the literature (aka "past
> experience") are not merely unfortunate outcomes of laziness, but an
> important part of the process.
>
> I am certainly am happy to support the notion that "more scientists
> should be able to program better", so  I am not going to say the
> entire article is wrong, and I don't disagree with their main
> conclusions.  But I think they have a fatal flaw in their assumptions
> and arguments.
>
> --Matt Newville
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-use

Exactly!   There is actually a question here that hasn't been made
explicit.  For whom is this advice intended?  There are all levels of
programming/programmers in STEM.  Some of my colleagues use Excel for
everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
Pythonista.  Each group uses programming differently.

I've been programming for more than 50 years.  I've taught programming
to engineers in several contexts over the years.  For a time, I really
wanted to 'do it right.'  (I even taught 'structured programming' and
'Warnier-Orr' at one point, but realized that it was worse than useless
for the particular audience.)  I've come to realize that most engineers
just want an answer.  They are not interested in how gracefully the
answer was arrived at.  MOST programs written by MOST engineers are
small, short, simple, and intended to solve one problem one time.  (The
deficiency I've most often seen is the lack of error checking for the
answer, and better programming techniques would not generally help much.)

The problem is that nobody sets out to write a "well respected"
program.  Someone sets out to scratch a particular itch ('one problem
one time').  It expands.  Others find it useful.  It becomes widely
used.  The original author, however, was solving his/her own particular
problem, and was not at all interested in "proper" programming.  So, I
guess my question is, how do we find that person who is going to write
the "well respected" program and convince him/her to take time out and
learn proper programming first? Because we are certainly not going to
convince everybody to do it.

john


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matthew Brett
Hi,

On Tue, May 28, 2013 at 2:52 PM, John Hassler <[hidden email]> wrote:
>
> On 5/28/2013 4:58 PM, Matt Newville wrote:

<snip>

>> I am certainly am happy to support the notion that "more scientists
>> should be able to program better", so  I am not going to say the
>> entire article is wrong, and I don't disagree with their main
>> conclusions.  But I think they have a fatal flaw in their assumptions
>> and arguments.
>>
>> --Matt Newville
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-use
>
> Exactly!   There is actually a question here that hasn't been made
> explicit.  For whom is this advice intended?  There are all levels of
> programming/programmers in STEM.  Some of my colleagues use Excel for
> everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
> use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
> Pythonista.  Each group uses programming differently.
>
> I've been programming for more than 50 years.  I've taught programming
> to engineers in several contexts over the years.  For a time, I really
> wanted to 'do it right.'  (I even taught 'structured programming' and
> 'Warnier-Orr' at one point, but realized that it was worse than useless
> for the particular audience.)  I've come to realize that most engineers
> just want an answer.  They are not interested in how gracefully the
> answer was arrived at.  MOST programs written by MOST engineers are
> small, short, simple, and intended to solve one problem one time.  (The
> deficiency I've most often seen is the lack of error checking for the
> answer, and better programming techniques would not generally help much.)
>
> The problem is that nobody sets out to write a "well respected"
> program.  Someone sets out to scratch a particular itch ('one problem
> one time').  It expands.  Others find it useful.  It becomes widely
> used.  The original author, however, was solving his/her own particular
> problem, and was not at all interested in "proper" programming.  So, I
> guess my question is, how do we find that person who is going to write
> the "well respected" program and convince him/her to take time out and
> learn proper programming first? Because we are certainly not going to
> convince everybody to do it.

You might find this reference interesting :

Basili, Victor R., et al. "Understanding the
High-Performance-Computing Community." (2008).

I found it from the Joppa article :
http://blog.nipy.org/science-joins-software.html

The take home message seems to be - "we tell scientists to use our
fancy stuff, they tell us no, and now we realize they were often
right".

That article is about high-level programming tools, but it must be
entirely different for version control, testing, code review, in
particular.  I believe these tools are very fundamental in controlling
error.

The point about error is the central, for me.  As I proceed further
down my scientific career, I slowly begin to realize the number of
errors we make, and how easy we find it to miss them:

http://blog.nipy.org/unscientific-programming.html

That, for me, is the key argument - we will make fewer mistakes and do
better science if we use the basic tools to help us control error and
to help others find our errors.

Most scientists (myself included) tend to believe this error is not
very important.

I believe that's is wrong, but as scientists we don't believe
everything we think, and so we need data.  I wonder how we should get
it...

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

josef.pktd
In reply to this post by John Hassler
On Tue, May 28, 2013 at 5:52 PM, John Hassler <[hidden email]> wrote:

>
> On 5/28/2013 4:58 PM, Matt Newville wrote:
>> Hi,
>>
>> As others have said, I find the low average programming skill level
>> among scientists frustrating,  but I also found this article quite
>> frustrating.
>>
>> >From my perspective, the authors main complaint seems to be that there
>> is not enough independent checking of specialized scientific software
>> written by scientists.  They seem particularly unhappy about the
>> tendency to use existing packages written by other scientists based on
>> "trust", "reputation", "previous citations" and without independent
>> checking.  They also say:
>>
>>        A "well-respected" end-user developer will almost certainly have
>> earned that respect
>>        through scientific breakthroughs, perhaps not for their software
>> engineering skills
>>        (although agreement on what constitutes "appropriate" scientific
>> software engineering
>>        standards is still under debate).
>>
>> On this point in particular, and indeed in this whole line of
>> argument, I think the authors are misguided, perhaps even to the point
>> of fatality damaging their whole argument.   I believe much more
>> common case is for the "well-respected" end-user developer to be known
>> for the programs written and supported, and less so for the scientific
>> breakthroughs (unless you count new programs as new instrumentation,
>> and so, well, breakthroughs, but it's pretty clear that the authors
>> are making a distinction).    It's too often the case that spending
>> any significant time on such programs is career suicide, as it takes
>> time and attention away from such breakthroughs.   It's perfectly
>> believable that the programming skills of such a scientific developer
>> may be incomplete, but I think it's fair to say that most supported
>> and well-used programs are likely the effort of people with
>> above-average programming skills and the interest and intent to
>> support such programs.   Indeed, I would argue that instead of being
>> unhappy about the reliance on trusted programs and developers, the
>> authors would better serve the scientific community by arguing that
>> the authors of such programs should be better supported, and given
>> access to tools and resources (ie, fund them) to improve their work
>> rather than treat them as untrustworthy programmers.
>>
>> I should admit to being one such author of a "well-respected" and
>> "trust" package for a very small scientific discipline, and with the
>> proverbial "many citations etc" because of this.  So I would admit to
>> being the just sort of person the authors are unhappy about.  I
>> suspect many people on this mailing list are in the same category.   I
>> would like to think the trust and respect for certain packages have
>> been earned, and that people use such packages because they are "known
>> to work", both in the sense of actually having been tested on
>> idealized cases, and in producing verifiable results in real cases
>> (where "testing" would not always be possible).   Indeed, the small,
>> decentralized group of scientific programmers that I work with (mostly
>> trained as physicists, and learning to program in Fortran -- some of
>> us still use mostly Fortran, in fact) do test and verify such codes,
>> precisely because we know other people use them.   Of course errors
>> occur, and of course testing is important.   Modern techniques like
>> distributed version control and unit testing are very good tools to
>> use.   I agree they should be used more thoroughly, and that one
>> should always be willing to question the results of a computer
>> program.
>>
>> Then again, when was the last time I tested the correctness of results
>> from my handheld HP calculator?    Hmm, a very, very long time ago.
>> That's software.  I tend to believe the messages I read in my inbox
>> are actually the message sent, and hardly ever do a checksum on it.
>> But that's software.  Indeed, all science is a social enterprise and
>> so "trust", "reputation", and reliance on the literature (aka "past
>> experience") are not merely unfortunate outcomes of laziness, but an
>> important part of the process.
>>
>> I am certainly am happy to support the notion that "more scientists
>> should be able to program better", so  I am not going to say the
>> entire article is wrong, and I don't disagree with their main
>> conclusions.  But I think they have a fatal flaw in their assumptions
>> and arguments.
>>
>> --Matt Newville
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-use
>
> Exactly!   There is actually a question here that hasn't been made
> explicit.  For whom is this advice intended?  There are all levels of
> programming/programmers in STEM.  Some of my colleagues use Excel for
> everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
> use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
> Pythonista.  Each group uses programming differently.
>
> I've been programming for more than 50 years.  I've taught programming
> to engineers in several contexts over the years.  For a time, I really
> wanted to 'do it right.'  (I even taught 'structured programming' and
> 'Warnier-Orr' at one point, but realized that it was worse than useless
> for the particular audience.)  I've come to realize that most engineers
> just want an answer.  They are not interested in how gracefully the
> answer was arrived at.  MOST programs written by MOST engineers are
> small, short, simple, and intended to solve one problem one time.  (The
> deficiency I've most often seen is the lack of error checking for the
> answer, and better programming techniques would not generally help much.)
>
> The problem is that nobody sets out to write a "well respected"
> program.  Someone sets out to scratch a particular itch ('one problem
> one time').  It expands.  Others find it useful.  It becomes widely
> used.  The original author, however, was solving his/her own particular
> problem, and was not at all interested in "proper" programming.  So, I
> guess my question is, how do we find that person who is going to write
> the "well respected" program and convince him/her to take time out and
> learn proper programming first? Because we are certainly not going to
> convince everybody to do it.
>
> john

I had the same impression as Matt about the article, but his writing
is clearer than my thinking.

For statistics and econometrics (and some economics), there are
researcher who develop tools and some who write tools and sometimes
they are the same.

R, Stata, SAS and matlab have support for user contributions,
journals, conferences, distribution channels.
Developers of new algorithms, statistical tests or estimators have an
incentive to see that the code goes to potential users because it
boosts adoption and with it the number of citations.

some examples
open source maybe without source control, unit tests and without license

http://ideas.repec.org/s/boc/bocode.html
http://www.feweb.vu.nl/econometriclinks/software.html#GAUSS
http://www.unc.edu/~jbhill/Gauss_by_code.htm
Alan Isaac had a Gauss program page, but I cannot find it anymore

example bocode and Stata Journal
Stata is very good in supporting user code
with peer review on the mailing lists (besides the articles)
and if everybody else is using it, then it must be "correct"

Josef

>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

William Carithers
In reply to this post by Matthew Brett
As a scientist who spends the majority of my time writing code to analyze
data, I've found this discussion fascinating. Early in my career, I actually
coded in assembly language (remember index registers?), then Fortran for a
couple of decades, then biting the bullet and moving to object-oriented
languages (Java, Python, Objective C). Now I use mostly Python. I hope the
following comments from this perspective will be useful.

1. There is no "one size fits all" . Sometimes I use Python as a BASIC-style
calculator, sometimes Python as procedural like Fortran, most of the time
Python as fully OO. The level of documentation, testing, and version control
need to be tailored to the problem.

2. In terms of getting scientists into the "modern world" of writing
maintainable, re-useable code, I think the most useful tool is a really good
IDE. Then much of the documentation, version control, de-bugging tools are
seamlessly there. I don't think I could have written acceptable Java without
Eclipse, and I'm absolutely positive that I couldn't write Objective C
without Xcode. I use IDLE for Python, but it is no where near the level of
these others.

Hope these help and keep up the good work,
Bill


On 5/28/13 3:05 PM, "Matthew Brett" <[hidden email]> wrote:

> Hi,
>
> On Tue, May 28, 2013 at 2:52 PM, John Hassler <[hidden email]> wrote:
>>
>> On 5/28/2013 4:58 PM, Matt Newville wrote:
>
> <snip>
>
>>> I am certainly am happy to support the notion that "more scientists
>>> should be able to program better", so  I am not going to say the
>>> entire article is wrong, and I don't disagree with their main
>>> conclusions.  But I think they have a fatal flaw in their assumptions
>>> and arguments.
>>>
>>> --Matt Newville
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-use
>>
>> Exactly!   There is actually a question here that hasn't been made
>> explicit.  For whom is this advice intended?  There are all levels of
>> programming/programmers in STEM.  Some of my colleagues use Excel for
>> everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
>> use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
>> Pythonista.  Each group uses programming differently.
>>
>> I've been programming for more than 50 years.  I've taught programming
>> to engineers in several contexts over the years.  For a time, I really
>> wanted to 'do it right.'  (I even taught 'structured programming' and
>> 'Warnier-Orr' at one point, but realized that it was worse than useless
>> for the particular audience.)  I've come to realize that most engineers
>> just want an answer.  They are not interested in how gracefully the
>> answer was arrived at.  MOST programs written by MOST engineers are
>> small, short, simple, and intended to solve one problem one time.  (The
>> deficiency I've most often seen is the lack of error checking for the
>> answer, and better programming techniques would not generally help much.)
>>
>> The problem is that nobody sets out to write a "well respected"
>> program.  Someone sets out to scratch a particular itch ('one problem
>> one time').  It expands.  Others find it useful.  It becomes widely
>> used.  The original author, however, was solving his/her own particular
>> problem, and was not at all interested in "proper" programming.  So, I
>> guess my question is, how do we find that person who is going to write
>> the "well respected" program and convince him/her to take time out and
>> learn proper programming first? Because we are certainly not going to
>> convince everybody to do it.
>
> You might find this reference interesting :
>
> Basili, Victor R., et al. "Understanding the
> High-Performance-Computing Community." (2008).
>
> I found it from the Joppa article :
> http://blog.nipy.org/science-joins-software.html
>
> The take home message seems to be - "we tell scientists to use our
> fancy stuff, they tell us no, and now we realize they were often
> right".
>
> That article is about high-level programming tools, but it must be
> entirely different for version control, testing, code review, in
> particular.  I believe these tools are very fundamental in controlling
> error.
>
> The point about error is the central, for me.  As I proceed further
> down my scientific career, I slowly begin to realize the number of
> errors we make, and how easy we find it to miss them:
>
> http://blog.nipy.org/unscientific-programming.html
>
> That, for me, is the key argument - we will make fewer mistakes and do
> better science if we use the basic tools to help us control error and
> to help others find our errors.
>
> Most scientists (myself included) tend to believe this error is not
> very important.
>
> I believe that's is wrong, but as scientists we don't believe
> everything we think, and so we need data.  I wonder how we should get
> it...
>
> Cheers,
>
> Matthew
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matthew Brett
Hi,

On Tue, May 28, 2013 at 3:44 PM, Bill Carithers <[hidden email]> wrote:
> As a scientist who spends the majority of my time writing code to analyze
> data, I've found this discussion fascinating. Early in my career, I actually
> coded in assembly language (remember index registers?), then Fortran for a
> couple of decades, then biting the bullet and moving to object-oriented
> languages (Java, Python, Objective C). Now I use mostly Python. I hope the
> following comments from this perspective will be useful.

Yes, thanks for sending it.

> 1. There is no "one size fits all" . Sometimes I use Python as a BASIC-style
> calculator, sometimes Python as procedural like Fortran, most of the time
> Python as fully OO. The level of documentation, testing, and version control
> need to be tailored to the problem.

That's true, but I personally use version control for everything.
Whenever I'm writing more than a few lines of code I start feeling
uncomfortable if it's not somewhere in version control and it's not
tested.    Version control is so easy I do remember to do that.
Testing is hard and annoying, I sometimes press on and almost
invariably regret it.

I think that discomfort - the feeling that I'm setting myself up for
future problems if I don't do this stuff - is what I'd like to be able
to teach the next generation of scientists so they can do a better job
than we did.   I'm still struggling with how to do that.

> 2. In terms of getting scientists into the "modern world" of writing
> maintainable, re-useable code, I think the most useful tool is a really good
> IDE. Then much of the documentation, version control, de-bugging tools are
> seamlessly there. I don't think I could have written acceptable Java without
> Eclipse, and I'm absolutely positive that I couldn't write Objective C
> without Xcode. I use IDLE for Python, but it is no where near the level of
> these others.

I'm a scientist, I've never taken a course in programming or CS. I've
written a lot of code in Matlab and Python.  I very occasionally used
the Matlab IDE, for debugging, but I've never used a Python IDE.  My
typical workflow is text editor, nosetests from terminal, IPython
console in a terminal to try stuff out.  I feel this helps me think
more clearly - it separates the editing world from the testing world
and the version control world.  But I might well be wrong about that.
 It seems to me there's a constant and difficult tension between
making it easy and making it easier to think.

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Paulo Jabardo
In reply to this post by Calvin Morrison
I'm an engineer working in research but I spend a good deal of time coding. What I've seen with most of my colleagues and friends is that they will only code whenever it is extremely necessary for an immediate application in an experiment or for their PhD. The problem starts very early, when I was beginning my studies, we were taught C (and that is still the case almost 20 years later). A small percentage of the students (10%?) enjoy programming and they will profit. I really loved pointers and doing neat tricks. For the rest it was torture, plain and simple torture. And completely useless. Most students couldn't do anything useful with programming. All their suffering was for nothing. What happened later was obvious: they would avoid programming at all costs and if they had to do something they would use MS-Excel. The spreadsheets I've seen... I still have nightmares. The things they accomplished humbles me, proves that I'm a lower being. I've seen people solve partial differential equations where each cell was an element in the solution and it was colored according to the result. Beautiful but I'd rather suffer accute physical pain than to do something like that, or worse, debug  such a "program". By the way, this sort of application was not a joke or a neat hack, it was actually the only way those guys knew how to solve a problem.

15 years later... I have a physics undergraduate student working with me. Very smart and interested. They still learn C and later on when they need to do something, what is it they do? Most professors use Origin. A huge improvement over Excel, but still. A couple of months ago, he had to turn in a report and since we don't have Origin, he was using Excel. I kind of felt sorry for him and I helped him out to do it in Python. He couldn't believe it.

I did my Masters and PhD in CFD. Most other students had almost no background in programming and did most things using Excel! When they had to modify some code, it was almost by accident that things worked. You can imagine what sort of code comes out of this. The professors didn't know programming much better. Just getting them to understand the concept of version control took a while.

In my opinion, If schools taught, at the begining, something like Python/Octave/R instead of C, students would be able to use this knowledge easily and productively throughout their courses and eventually learn C when they really needed it.

Paulo


De: Calvin Morrison <[hidden email]>
Para: SciPy Users List <[hidden email]>
Enviadas: Terça-feira, 28 de Maio de 2013 15:23
Assunto: Re: [SciPy-User] peer review of scientific software




On 27 May 2013 13:44, Alan G Isaac <[hidden email]> wrote:

http://www.sciencemag.org/content/340/6134/814.summary

Maybe I can use this as a ranting point.

As a background I am a programmer, but I have been hired by various professors and other academic people to work on projects. My biggest problem with the "scientific computing" community is having to put up with this crap!

I am so sick of reading a paper, that will exactly fulfill my needs, only to find out the software has disappeared, only works on HP-UX on one computer in the world, or has absolutely zero documentation. 

I've started pushing my lab to follow standards, use version control, document our changes, and publish our code. So far it is going really well, but I can only do so much.

If only all of all students had a few basic "proper programming practices" courses, everything would go a lot smoother. Teaching programming is fine, but why don't we teach our scientists the best way to collaborate in the 21st century?

Below is another related paper that is a good starting point for converting users. Enough emailing tarballs back and forth! Enough undocumented code! Enough is Enough!


Pissed-off Scientific Programmer,
Calvin Morrison



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Matthew Brett
Hi,

On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo <[hidden email]> wrote:

> I'm an engineer working in research but I spend a good deal of time coding.
> What I've seen with most of my colleagues and friends is that they will only
> code whenever it is extremely necessary for an immediate application in an
> experiment or for their PhD. The problem starts very early, when I was
> beginning my studies, we were taught C (and that is still the case almost 20
> years later). A small percentage of the students (10%?) enjoy programming
> and they will profit. I really loved pointers and doing neat tricks. For the
> rest it was torture, plain and simple torture. And completely useless. Most
> students couldn't do anything useful with programming. All their suffering
> was for nothing. What happened later was obvious: they would avoid
> programming at all costs and if they had to do something they would use
> MS-Excel. The spreadsheets I've seen... I still have nightmares. The things
> they accomplished humbles me, proves that I'm a lower being. I've seen
> people solve partial differential equations where each cell was an element
> in the solution and it was colored according to the result. Beautiful but
> I'd rather suffer accute physical pain than to do something like that, or
> worse, debug  such a "program". By the way, this sort of application was not
> a joke or a neat hack, it was actually the only way those guys knew how to
> solve a problem.
>
> 15 years later... I have a physics undergraduate student working with me.
> Very smart and interested. They still learn C and later on when they need to
> do something, what is it they do? Most professors use Origin. A huge
> improvement over Excel, but still. A couple of months ago, he had to turn in
> a report and since we don't have Origin, he was using Excel. I kind of felt
> sorry for him and I helped him out to do it in Python. He couldn't believe
> it.

Oh - dear; you probably saw this stuff?

http://blog.stodden.net/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/

> I did my Masters and PhD in CFD. Most other students had almost no
> background in programming and did most things using Excel! When they had to
> modify some code, it was almost by accident that things worked. You can
> imagine what sort of code comes out of this. The professors didn't know
> programming much better. Just getting them to understand the concept of
> version control took a while.
>
> In my opinion, If schools taught, at the begining, something like
> Python/Octave/R instead of C, students would be able to use this knowledge
> easily and productively throughout their courses and eventually learn C when
> they really needed it.

That's surely one of the big arguments for Python - it is a great
first language, and it is capable across a wider range than Octave or
R - or even Excel :)

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Bjorn Madsen
In reply to this post by William Carithers
+1 to Bill Carithers.


On 28 May 2013 23:44, Bill Carithers <[hidden email]> wrote:
As a scientist who spends the majority of my time writing code to analyze
data, I've found this discussion fascinating. Early in my career, I actually
coded in assembly language (remember index registers?), then Fortran for a
couple of decades, then biting the bullet and moving to object-oriented
languages (Java, Python, Objective C). Now I use mostly Python. I hope the
following comments from this perspective will be useful.

1. There is no "one size fits all" . Sometimes I use Python as a BASIC-style
calculator, sometimes Python as procedural like Fortran, most of the time
Python as fully OO. The level of documentation, testing, and version control
need to be tailored to the problem.

2. In terms of getting scientists into the "modern world" of writing
maintainable, re-useable code, I think the most useful tool is a really good
IDE. Then much of the documentation, version control, de-bugging tools are
seamlessly there. I don't think I could have written acceptable Java without
Eclipse, and I'm absolutely positive that I couldn't write Objective C
without Xcode. I use IDLE for Python, but it is no where near the level of
these others.

Hope these help and keep up the good work,
Bill


On 5/28/13 3:05 PM, "Matthew Brett" <[hidden email]> wrote:

> Hi,
>
> On Tue, May 28, 2013 at 2:52 PM, John Hassler <[hidden email]> wrote:
>>
>> On 5/28/2013 4:58 PM, Matt Newville wrote:
>
> <snip>
>
>>> I am certainly am happy to support the notion that "more scientists
>>> should be able to program better", so  I am not going to say the
>>> entire article is wrong, and I don't disagree with their main
>>> conclusions.  But I think they have a fatal flaw in their assumptions
>>> and arguments.
>>>
>>> --Matt Newville
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-use
>>
>> Exactly!   There is actually a question here that hasn't been made
>> explicit.  For whom is this advice intended?  There are all levels of
>> programming/programmers in STEM.  Some of my colleagues use Excel for
>> everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
>> use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
>> Pythonista.  Each group uses programming differently.
>>
>> I've been programming for more than 50 years.  I've taught programming
>> to engineers in several contexts over the years.  For a time, I really
>> wanted to 'do it right.'  (I even taught 'structured programming' and
>> 'Warnier-Orr' at one point, but realized that it was worse than useless
>> for the particular audience.)  I've come to realize that most engineers
>> just want an answer.  They are not interested in how gracefully the
>> answer was arrived at.  MOST programs written by MOST engineers are
>> small, short, simple, and intended to solve one problem one time.  (The
>> deficiency I've most often seen is the lack of error checking for the
>> answer, and better programming techniques would not generally help much.)
>>
>> The problem is that nobody sets out to write a "well respected"
>> program.  Someone sets out to scratch a particular itch ('one problem
>> one time').  It expands.  Others find it useful.  It becomes widely
>> used.  The original author, however, was solving his/her own particular
>> problem, and was not at all interested in "proper" programming.  So, I
>> guess my question is, how do we find that person who is going to write
>> the "well respected" program and convince him/her to take time out and
>> learn proper programming first? Because we are certainly not going to
>> convince everybody to do it.
>
> You might find this reference interesting :
>
> Basili, Victor R., et al. "Understanding the
> High-Performance-Computing Community." (2008).
>
> I found it from the Joppa article :
> http://blog.nipy.org/science-joins-software.html
>
> The take home message seems to be - "we tell scientists to use our
> fancy stuff, they tell us no, and now we realize they were often
> right".
>
> That article is about high-level programming tools, but it must be
> entirely different for version control, testing, code review, in
> particular.  I believe these tools are very fundamental in controlling
> error.
>
> The point about error is the central, for me.  As I proceed further
> down my scientific career, I slowly begin to realize the number of
> errors we make, and how easy we find it to miss them:
>
> http://blog.nipy.org/unscientific-programming.html
>
> That, for me, is the key argument - we will make fewer mistakes and do
> better science if we use the basic tools to help us control error and
> to help others find our errors.
>
> Most scientists (myself included) tend to believe this error is not
> very important.
>
> I believe that's is wrong, but as scientists we don't believe
> everything we think, and so we need data.  I wonder how we should get
> it...
>
> Cheers,
>
> Matthew
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



--
Bjorn Madsen
Researcher Complex Systems Research
Ph.: (+44) 0 7792 030 720 


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: peer review of scientific software

Suzen, Mehmet
In reply to this post by Calvin Morrison
On 28 May 2013 20:23, Calvin Morrison <[hidden email]> wrote:
>
>
>
> On 27 May 2013 13:44, Alan G Isaac <[hidden email]> wrote:
>>
>>
>> http://www.sciencemag.org/content/340/6134/814.summary
>
>

It is very ironic that someone from Microsoft, a closed source
software company,
demands others to release their source code. ".. need for open access
to software
..even the most basic step of making source code available upon
publication.".  I don't
know maybe Microsoft Research publishes all the source code they
use/developed in their research
output, so then it makes sense but I highly doubt it though due to patents etc.


> Maybe I can use this as a ranting point.
>
> As a background I am a programmer, but I have been hired by various
> professors and other academic people to work on projects. My biggest problem
> with the "scientific computing" community is having to put up with this
> crap!

Just a remark: software is not the subject of scientific research at
least in computational
physical sciences if not all. It's a tool and infrastructure. NO
clever software development practice would give you a good
scientific output. It can improve the efficiency/correctness greatly.
However, for example implementing wrong equation,
n-body simulation code that do not conserve momentum by construction
or similar can not be detected by the highest quality
software development life cycle. So scientific programmers should
think about this as well before cursing to professors and academic
people.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
1234