# NumPy Binomial BTPE method Problem

13 messages
Open this post in threaded view
|

## NumPy Binomial BTPE method Problem

 Hello all, I am implementing a binomial random variable in MATLAB. The default method in the statistics toolbox is extremely slow for large population/trial size. I am needing to do trials for n as large as 2**28. I found in NumPy some code that implements a binomial random draw in numpy/random/mtrand/distributions.c. I was trying to convert the code to MATLAB and the BTPE method seems to have an error in lines 337-341 of distributions.c. The if ... else if ... else statement I think is incorrect. I think it should be an if ... else ... statement followed by the contents of the original else which starts on line 337. The if ... else if ... else block is as follows: #### begin code snippet #### if (m < y) {     for (i=m; i<=y; i++)     {         F *= (a/i - s);     } } else if (m > y) {     for (i=y; i<=m; i++)     {         F /= (a/i - s);     } } else {     if (v > F) goto Step10;     goto Step60; } #### end code snippet #### >From what I can tell, the variable F is only used in the comparison within the else{} statment (i.e. the if(v > F) statement) and nowhere else within the scope of the function. I also found a fortran implementation here: http://wstein.org/home/wstein/www/home/mhansen/spkgs_in_progress/octave-3.2.4/src/libcruft/ranlib/ignbin.fand it appears this is from where the code was originally adapted as the variable names are the same. My parsing of fortran GOTOs is a bit rusty, but I think the contents of the else block in above snippet should be not be conditional. I don't understand the underlying algorithm very well and don't have access the the BTPE paper, so I can't comment on the validity of the fortran code. There just seems to be an error in logic in the above code. So please have someone who understands it look at it. It appears Robert Kern wrote the function a decent portion of the file at some point in the past. I hope this helps. Cheers, -- Josh Lawrence P.S. I apologize if my email is inconvenient, but I could not figure out how to tell gmail to set the reply-to field to be [hidden email]. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 Hey all, I received access to the paper and it seems it was originally based purely on the paper written by Kachitvichyanukul in 1988. I still think there's a whoopsies with the if ... else if ... else, block though. On Wed, Oct 3, 2012 at 11:54 AM, Josh Lawrence <[hidden email]> wrote: > Hello all, > > I am implementing a binomial random variable in MATLAB. The default > method in the statistics toolbox is extremely slow for large > population/trial size. I am needing to do trials for n as large as > 2**28. I found in NumPy some code that implements a binomial random > draw in numpy/random/mtrand/distributions.c. I was trying to convert > the code to MATLAB and the BTPE method seems to have an error in lines > 337-341 of distributions.c. The if ... else if ... else statement I > think is incorrect. I think it should be an if ... else ... statement > followed by the contents of the original else which starts on line > 337. > > The if ... else if ... else block is as follows: > > #### begin code snippet #### > if (m < y) > { >     for (i=m; i<=y; i++) >     { >         F *= (a/i - s); >     } > } > else if (m > y) > { >     for (i=y; i<=m; i++) >     { >         F /= (a/i - s); >     } > } > else > { >     if (v > F) goto Step10; >     goto Step60; > } > #### end code snippet #### > > From what I can tell, the variable F is only used in the comparison > within the else{} statment (i.e. the if(v > F) statement) and nowhere > else within the scope of the function. > > I also found a fortran implementation here: > http://wstein.org/home/wstein/www/home/mhansen/spkgs_in_progress/octave-3.2.4/src/libcruft/ranlib/ignbin.f> and it appears this is from where the code was originally adapted as > the variable names are the same. > > My parsing of fortran GOTOs is a bit rusty, but I think the contents > of the else block in above snippet should be not be conditional. > > I don't understand the underlying algorithm very well and don't have > access the the BTPE paper, so I can't comment on the validity of the > fortran code. There just seems to be an error in logic in the above > code. So please have someone who understands it look at it. It appears > Robert Kern wrote the function a decent portion of the file at some > point in the past. > > I hope this helps. > > Cheers, > > -- > Josh Lawrence > > > P.S. I apologize if my email is inconvenient, but I could not figure > out how to tell gmail to set the reply-to field to be > [hidden email]. -- Josh Lawrence _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 On Wed, Oct 3, 2012 at 2:42 PM, Josh Lawrence <[hidden email]> wrote: > Hey all, > > I received access to the paper and it seems it was originally based > purely on the paper written by Kachitvichyanukul in 1988. I still > think there's a whoopsies with the if ... else if ... else, block > though. the c code "else" looks strange to me, however, checking a few cases with large p*n for a large sample (1 million draws), I don't see any difference of the frequency count to the theoretical distribution from scipy.binom. (but with all the goto's I'm not sure if I really trigger that path.) Josef > > On Wed, Oct 3, 2012 at 11:54 AM, Josh Lawrence > <[hidden email]> wrote: >> Hello all, >> >> I am implementing a binomial random variable in MATLAB. The default >> method in the statistics toolbox is extremely slow for large >> population/trial size. I am needing to do trials for n as large as >> 2**28. I found in NumPy some code that implements a binomial random >> draw in numpy/random/mtrand/distributions.c. I was trying to convert >> the code to MATLAB and the BTPE method seems to have an error in lines >> 337-341 of distributions.c. The if ... else if ... else statement I >> think is incorrect. I think it should be an if ... else ... statement >> followed by the contents of the original else which starts on line >> 337. >> >> The if ... else if ... else block is as follows: >> >> #### begin code snippet #### >> if (m < y) >> { >>     for (i=m; i<=y; i++) >>     { >>         F *= (a/i - s); >>     } >> } >> else if (m > y) >> { >>     for (i=y; i<=m; i++) >>     { >>         F /= (a/i - s); >>     } >> } >> else >> { >>     if (v > F) goto Step10; >>     goto Step60; >> } >> #### end code snippet #### >> >> From what I can tell, the variable F is only used in the comparison >> within the else{} statment (i.e. the if(v > F) statement) and nowhere >> else within the scope of the function. >> >> I also found a fortran implementation here: >> http://wstein.org/home/wstein/www/home/mhansen/spkgs_in_progress/octave-3.2.4/src/libcruft/ranlib/ignbin.f>> and it appears this is from where the code was originally adapted as >> the variable names are the same. >> >> My parsing of fortran GOTOs is a bit rusty, but I think the contents >> of the else block in above snippet should be not be conditional. >> >> I don't understand the underlying algorithm very well and don't have >> access the the BTPE paper, so I can't comment on the validity of the >> fortran code. There just seems to be an error in logic in the above >> code. So please have someone who understands it look at it. It appears >> Robert Kern wrote the function a decent portion of the file at some >> point in the past. >> >> I hope this helps. >> >> Cheers, >> >> -- >> Josh Lawrence >> >> >> P.S. I apologize if my email is inconvenient, but I could not figure >> out how to tell gmail to set the reply-to field to be >> [hidden email]. > > > > -- > Josh Lawrence > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 On Wed, Oct 3, 2012 at 3:07 PM,  <[hidden email]> wrote: > On Wed, Oct 3, 2012 at 2:42 PM, Josh Lawrence <[hidden email]> wrote: >> Hey all, >> >> I received access to the paper and it seems it was originally based >> purely on the paper written by Kachitvichyanukul in 1988. I still >> think there's a whoopsies with the if ... else if ... else, block >> though. > > the c code "else" looks strange to me, > however, checking a few cases with large p*n for a large sample (1 > million draws), I don't see any difference of the frequency count to > the theoretical distribution from scipy.binom. I'm pretty sure you are right. (If my reading as non c programmer is correct) The else block means that Step 50 is never used, instead it uses Step 52, which uses a different approximation that is intended for the tails. If Step 52 is relatively close to the result of Step 50, then it will not be very visible in the final results. >From my reading of the code there should be a small distortion around the mean. Josef > > (but with all the goto's I'm not sure if I really trigger that path.) > > Josef > >> >> On Wed, Oct 3, 2012 at 11:54 AM, Josh Lawrence >> <[hidden email]> wrote: >>> Hello all, >>> >>> I am implementing a binomial random variable in MATLAB. The default >>> method in the statistics toolbox is extremely slow for large >>> population/trial size. I am needing to do trials for n as large as >>> 2**28. I found in NumPy some code that implements a binomial random >>> draw in numpy/random/mtrand/distributions.c. I was trying to convert >>> the code to MATLAB and the BTPE method seems to have an error in lines >>> 337-341 of distributions.c. The if ... else if ... else statement I >>> think is incorrect. I think it should be an if ... else ... statement >>> followed by the contents of the original else which starts on line >>> 337. >>> >>> The if ... else if ... else block is as follows: >>> >>> #### begin code snippet #### >>> if (m < y) >>> { >>>     for (i=m; i<=y; i++) >>>     { >>>         F *= (a/i - s); >>>     } >>> } >>> else if (m > y) >>> { >>>     for (i=y; i<=m; i++) >>>     { >>>         F /= (a/i - s); >>>     } >>> } >>> else >>> { >>>     if (v > F) goto Step10; >>>     goto Step60; >>> } >>> #### end code snippet #### >>> >>> From what I can tell, the variable F is only used in the comparison >>> within the else{} statment (i.e. the if(v > F) statement) and nowhere >>> else within the scope of the function. >>> >>> I also found a fortran implementation here: >>> http://wstein.org/home/wstein/www/home/mhansen/spkgs_in_progress/octave-3.2.4/src/libcruft/ranlib/ignbin.f>>> and it appears this is from where the code was originally adapted as >>> the variable names are the same. >>> >>> My parsing of fortran GOTOs is a bit rusty, but I think the contents >>> of the else block in above snippet should be not be conditional. >>> >>> I don't understand the underlying algorithm very well and don't have >>> access the the BTPE paper, so I can't comment on the validity of the >>> fortran code. There just seems to be an error in logic in the above >>> code. So please have someone who understands it look at it. It appears >>> Robert Kern wrote the function a decent portion of the file at some >>> point in the past. >>> >>> I hope this helps. >>> >>> Cheers, >>> >>> -- >>> Josh Lawrence >>> >>> >>> P.S. I apologize if my email is inconvenient, but I could not figure >>> out how to tell gmail to set the reply-to field to be >>> [hidden email]. >> >> >> >> -- >> Josh Lawrence >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 In reply to this post by Josh Lawrence-2 On Wed, Oct 3, 2012 at 5:54 PM, Josh Lawrence <[hidden email]> wrote: > Hello all, > > I am implementing a binomial random variable in MATLAB. The default > method in the statistics toolbox is extremely slow for large > population/trial size. I am needing to do trials for n as large as > 2**28. I found in NumPy some code that implements a binomial random > draw in numpy/random/mtrand/distributions.c. I was trying to convert > the code to MATLAB and the BTPE method seems to have an error in lines > 337-341 of distributions.c. The if ... else if ... else statement I > think is incorrect. I think it should be an if ... else ... statement > followed by the contents of the original else which starts on line > 337. Yes, you are correct, on this point as well as the m+1 and y+1. Thank you for debugging my code! -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 Hah, my pleasure. I'm surprised I found them, as your code seems to always work so well. On Wed, Oct 3, 2012 at 4:28 PM, Robert Kern <[hidden email]> wrote: > On Wed, Oct 3, 2012 at 5:54 PM, Josh Lawrence <[hidden email]> wrote: >> Hello all, >> >> I am implementing a binomial random variable in MATLAB. The default >> method in the statistics toolbox is extremely slow for large >> population/trial size. I am needing to do trials for n as large as >> 2**28. I found in NumPy some code that implements a binomial random >> draw in numpy/random/mtrand/distributions.c. I was trying to convert >> the code to MATLAB and the BTPE method seems to have an error in lines >> 337-341 of distributions.c. The if ... else if ... else statement I >> think is incorrect. I think it should be an if ... else ... statement >> followed by the contents of the original else which starts on line >> 337. > > Yes, you are correct, on this point as well as the m+1 and y+1. Thank > you for debugging my code! > > -- > Robert Kern > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user-- Josh Lawrence _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 On Wed, Oct 3, 2012 at 10:42 PM, Josh Lawrence <[hidden email]> wrote: > Hah, my pleasure. I'm surprised I found them, as your code seems to > always work so well. I was a bored grad student, desperately not trying to do real work and mistranslated some goto logic. The paper is clearer than the RANLIB code I was referencing, but I must have missed that. -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 Yes, I found the paper quite clear. I did a while loop with if blocks (basically a switch statement) instead of goto statements since I was in MATLAB and it makes a lot more sense the way I wrote it. On Wed, Oct 3, 2012 at 4:45 PM, Robert Kern <[hidden email]> wrote: > On Wed, Oct 3, 2012 at 10:42 PM, Josh Lawrence > <[hidden email]> wrote: >> Hah, my pleasure. I'm surprised I found them, as your code seems to >> always work so well. > > I was a bored grad student, desperately not trying to do real work and > mistranslated some goto logic. The paper is clearer than the RANLIB > code I was referencing, but I must have missed that. > > -- > Robert Kern > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user-- Josh Lawrence _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: NumPy Binomial BTPE method Problem

 As a follow-up, I should submit a pull request, correct? Has this conversation been copied/moved/seen on the appropriate numpy mailing list (I realized after the 2nd or 3rd post I probably should have posted it to numpy-dev). On Wed, Oct 3, 2012 at 5:00 PM, Josh Lawrence <[hidden email]> wrote: > Yes, I found the paper quite clear. I did a while loop with if blocks > (basically a switch statement) instead of goto statements since I was > in MATLAB and it makes a lot more sense the way I wrote it. > > On Wed, Oct 3, 2012 at 4:45 PM, Robert Kern <[hidden email]> wrote: >> On Wed, Oct 3, 2012 at 10:42 PM, Josh Lawrence >> <[hidden email]> wrote: >>> Hah, my pleasure. I'm surprised I found them, as your code seems to >>> always work so well. >> >> I was a bored grad student, desperately not trying to do real work and >> mistranslated some goto logic. The paper is clearer than the RANLIB >> code I was referencing, but I must have missed that. >> >> -- >> Robert Kern >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user> > > > -- > Josh Lawrence -- Josh Lawrence _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user