# [SciPy-User] Creating a Custom PDF with Arguments Using Scipy Stats

6 messages
Open this post in threaded view
|

## [SciPy-User] Creating a Custom PDF with Arguments Using Scipy Stats

 Hi all- I am trying to create a custom PDF with arguments for some statistical analysis I am doing. I want to have a power law with an exponential cutoff at low values: mathematically, this looks like P(m) = N * m**(-index) * (1 - exp((-m**2)/m0**2)  with three parameters: a normalization (N), an index (index), and a scale value (m0). I have tried to produce some code that creates this pdf and while the pdf and cdf functions appear to be working as I hoped, when I try to generate random values they don't follow this distribution at all, and I am not sure why. It seems to be far more heavily drawing from the low end of the curve (which are explicitly suppressed in the pdf) Can you suggest the changes I need to make to try and figure out why my code isn't working as planned? My code snippet is below: from scipy import stats import numpy as np from matplotlib import pyplot as plt class my_pdf(stats.rv_continuous):     def _pdf(self,x,norm,index,m0):         return norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) ) my_cv= my_pdf(a=1e-6,b=1e-2,name="Test") rvs=my_cv.rvs(1.0,1.7,1e-3,size=1000)  #Values should be peaked around 1e-3, but histogram peaks around 1e-5... I am really having a hard time figuring out how to create a robust and complex custom rv_continuous in scipy based on scipy, so any help you can provide would be most appreciated. Cheers, Steve  _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Creating a Custom PDF with Arguments Using Scipy Stats

 08.06.2016 13:50 пользователь "Steven Ehlert" <[hidden email]> написал: > > Hi all- > > I am trying to create a custom PDF with arguments for some statistical analysis I am doing. I want to have a power law with an exponential cutoff at low values: mathematically, this looks like > > P(m) = N * m**(-index) * (1 - exp((-m**2)/m0**2)  with three parameters: a normalization (N), an index (index), and a scale value (m0). > > > > I have tried to produce some code that creates this pdf and while the pdf and cdf functions appear to be working as I hoped, when I try to generate random values they don't follow this distribution at all, and I am not sure why. It seems to be far more heavily drawing from the low end of the curve (which are explicitly suppressed in the pdf) > > Can you suggest the changes I need to make to try and figure out why my code isn't working as planned? My code snippet is below: > > > from scipy import stats > import numpy as np > from matplotlib import pyplot as plt > > > class my_pdf(stats.rv_continuous): >     def _pdf(self,x,norm,index,m0): >         return norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) ) > > > my_cv= my_pdf(a=1e-6,b=1e-2,name="Test") > > > rvs=my_cv.rvs(1.0,1.7,1e-3,size=1000)  #Values should be peaked around 1e-3, but histogram peaks around 1e-5... > > > > I am really having a hard time figuring out how to create a robust and complex custom rv_continuous in scipy based on scipy, so any help you can provide would be most appreciated. > > > Cheers, > > Steve  > > _______________________________________________ > SciPy-User mailing list > [hidden email] > https://mail.scipy.org/mailman/listinfo/scipy-user > First and foremost, you need to fix the normalization: the norm constant is not a free parameter, as the pdf must integrate to unity over the support of your RV. Next, the scale parameter is handled by the framework: pdf, cdf methods all accept the `scale` argument. Thus you do not need the m0 parameter. HTH, Evgeni _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Creating a Custom PDF with Arguments Using Scipy Stats

 Okay-Thank you for the insights.So three questions about this, as I am not an expert in the implementation. I figured that pdf normalization may play a roleFirst of all, what is the best way to numerically handle the normalization? Do I need to include the integral every time I initialize the pdf? Something like this: class my_pdf(stats.rv_continuous):    def _pdf(self,x,index,m0):        step=(self.b-self.a)/100.         this_xrange=np.arange(self.a,self.b+step,step)                this_function=this_xrange**(-1*index) * (1. -np.exp((-1*this_xrange**2)/m0**2) )        #for x,f in zip(this_xrange,this_function): print x,f        this_norm=1./(integrate.simps(this_function,x=this_xrange))        #print this_norm                return this_norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) )I don't think there is an analytic form for this integral, so I have to do it numerically, but I also have to implement it with the pdf. And the second question I have is:  how do I call the scale variable within the _pdf function? Would I simply put in self.scale in place of m0 in the code above? Or is it more sophisticated than that? Finally, while I realized these concerns when I wrote it, but I didn't expect the pdf to look okay but the rvs to be so off- are there reasons to expect this behavior based on my previously incorrect normalization? It is not obvious to me that it would. Thank you for your help thus far and I look forward to learning more about custom statistical distributions in SciPy. Steven EhlertOn Wed, Jun 8, 2016 at 5:59 AM, Evgeni Burovski wrote: 08.06.2016 13:50 пользователь "Steven Ehlert" <[hidden email]> написал: > > Hi all- > > I am trying to create a custom PDF with arguments for some statistical analysis I am doing. I want to have a power law with an exponential cutoff at low values: mathematically, this looks like > > P(m) = N * m**(-index) * (1 - exp((-m**2)/m0**2)  with three parameters: a normalization (N), an index (index), and a scale value (m0). > > > > I have tried to produce some code that creates this pdf and while the pdf and cdf functions appear to be working as I hoped, when I try to generate random values they don't follow this distribution at all, and I am not sure why. It seems to be far more heavily drawing from the low end of the curve (which are explicitly suppressed in the pdf) > > Can you suggest the changes I need to make to try and figure out why my code isn't working as planned? My code snippet is below: > > > from scipy import stats > import numpy as np > from matplotlib import pyplot as plt > > > class my_pdf(stats.rv_continuous): >     def _pdf(self,x,norm,index,m0): >         return norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) ) > > > my_cv= my_pdf(a=1e-6,b=1e-2,name="Test") > > > rvs=my_cv.rvs(1.0,1.7,1e-3,size=1000)  #Values should be peaked around 1e-3, but histogram peaks around 1e-5... > > > > I am really having a hard time figuring out how to create a robust and complex custom rv_continuous in scipy based on scipy, so any help you can provide would be most appreciated. > > > Cheers, > > Steve  > > _______________________________________________ > SciPy-User mailing list > [hidden email] > https://mail.scipy.org/mailman/listinfo/scipy-user > First and foremost, you need to fix the normalization: the norm constant is not a free parameter, as the pdf must integrate to unity over the support of your RV. Next, the scale parameter is handled by the framework: pdf, cdf methods all accept the `scale` argument. Thus you do not need the m0 parameter. HTH, Evgeni _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Creating a Custom PDF with Arguments Using Scipy Stats

 On Wed, Jun 8, 2016 at 1:31 PM, Steven Ehlert <[hidden email]> wrote: > Okay- > > Thank you for the insights. > > So three questions about this, as I am not an expert in the implementation. > I figured that pdf normalization may play a role > > > First of all, what is the best way to numerically handle the normalization? > Do I need to include the integral every time I initialize the pdf? Something > like this: > class my_pdf(stats.rv_continuous): >     def _pdf(self,x,index,m0): >         step=(self.b-self.a)/100. > >         this_xrange=np.arange(self.a,self.b+step,step) > >         this_function=this_xrange**(-1*index) * (1. > -np.exp((-1*this_xrange**2)/m0**2) ) > > >         #for x,f in zip(this_xrange,this_function): print x,f >         this_norm=1./(integrate.simps(this_function,x=this_xrange)) >         #print this_norm > >         return this_norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) ) > Yes, this would need to compute it at each call to pdf. The integral can be expressed in terms of gamma functions (do a substitution of y = x**2 /2 in the integral). Alternatively, you can do the integral numerically, likely `quad` is better than `simps`. > I don't think there is an analytic form for this integral, so I have to do > it numerically, but I also have to implement it with the pdf. > > And the second question I have is:  how do I call the scale variable within > the _pdf function? Would I simply put in self.scale in place of m0 in the > code above? Or is it more sophisticated than that? It's easy: you don't need to :-). distribution.pdf(x, loc=11, scale=42) calls distribution._pdf(y) with y = (x - loc) / scale. This way, you `_pdf` only needs to deal with a scaled variable. There is no self.scale --- it's just a parameter to any call to pdf, cdf, sf etc. > Finally, while I realized these concerns when I wrote it, but I didn't > expect the pdf to look okay but the rvs to be so off- are there reasons to > expect this behavior based on my previously incorrect normalization? It is > not obvious to me that it would. PDF needs to integrate to one. If the normalization is off, it still might look "okay" because the shape is the same, but it's still wrong. Drawing random variates assumes that the cdf is a real distribution function. From experience, having ok-looking PDF and nonsense for the random variates is a typical sign that the normalization might be off. > Thank you for your help thus far and I look forward to learning more about > custom statistical distributions in SciPy. > > > Steven Ehlert > > > On Wed, Jun 8, 2016 at 5:59 AM, Evgeni Burovski <[hidden email]> > wrote: >> >> >> 08.06.2016 13:50 пользователь "Steven Ehlert" <[hidden email]> >> написал: >> >> >> > >> > Hi all- >> > >> > I am trying to create a custom PDF with arguments for some statistical >> > analysis I am doing. I want to have a power law with an exponential cutoff >> > at low values: mathematically, this looks like >> > >> > P(m) = N * m**(-index) * (1 - exp((-m**2)/m0**2)  with three parameters: >> > a normalization (N), an index (index), and a scale value (m0). >> > >> > >> > >> > I have tried to produce some code that creates this pdf and while the >> > pdf and cdf functions appear to be working as I hoped, when I try to >> > generate random values they don't follow this distribution at all, and I am >> > not sure why. It seems to be far more heavily drawing from the low end of >> > the curve (which are explicitly suppressed in the pdf) >> > >> > Can you suggest the changes I need to make to try and figure out why my >> > code isn't working as planned? My code snippet is below: >> > >> > >> > from scipy import stats >> > import numpy as np >> > from matplotlib import pyplot as plt >> > >> > >> > class my_pdf(stats.rv_continuous): >> >     def _pdf(self,x,norm,index,m0): >> >         return norm*x**(-1*index) * (1. -np.exp((-1*x**2)/m0**2) ) >> > >> > >> > my_cv= my_pdf(a=1e-6,b=1e-2,name="Test") >> > >> > >> > rvs=my_cv.rvs(1.0,1.7,1e-3,size=1000)  #Values should be peaked around >> > 1e-3, but histogram peaks around 1e-5... >> > >> > >> > >> > I am really having a hard time figuring out how to create a robust and >> > complex custom rv_continuous in scipy based on scipy, so any help you can >> > provide would be most appreciated. >> > >> > >> > Cheers, >> > >> > Steve >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > [hidden email] >> > https://mail.scipy.org/mailman/listinfo/scipy-user>> > >> >> First and foremost, you need to fix the normalization: the norm constant >> is not a free parameter, as the pdf must integrate to unity over the support >> of your RV. >> >> Next, the scale parameter is handled by the framework: pdf, cdf methods >> all accept the `scale` argument. Thus you do not need the m0 parameter. >> >> HTH, >> >> Evgeni >> >> >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> https://mail.scipy.org/mailman/listinfo/scipy-user>> > > > _______________________________________________ > SciPy-User mailing list > [hidden email] > https://mail.scipy.org/mailman/listinfo/scipy-user> _______________________________________________ SciPy-User mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/scipy-user