# [SciPy-User] fitting discrete probability distributions to data Classic List Threaded 4 messages Open this post in threaded view
|

## [SciPy-User] fitting discrete probability distributions to data

 hi,i have some data:A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50). B) a 1D array that is just a particular row of that 2d arrayi need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B).i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data.specific subquestions:- do i need to load data in as a pandas df? an ndarray? does it not matter?- i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data- if someone can explain how to fit with statsmodels' "Negative Binomial (http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions- is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works- honestly i don't know what i'm doing, please help!if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me.thanks a lotc _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: fitting discrete probability distributions to data

 On Wed, Mar 11, 2015 at 7:14 PM, c wrote:hi,i have some data:A) a 1d array (dimensions 1x50), made by summing the columns of a 2d array (dimensions ~20k x 50). B) a 1D array that is just a particular row of that 2d arrayi need to fit a sum of 2 negative binomial distributions to A), and to fit a single negative binomial distrib. to B).i have spent a while now reading the documentation for numpy.stats and the statsmodel package and various stack overflow posts, etc.. but i do not yet understand how to go about fitting a discrete probability distribution to a vector of data.Do  you have the data in the form of histograms (counts) or the original data ?statsmodels can only estimate based on the original data which is assumed to consist of observations drawn from a Negative Binomial distribution.  Fitting histogram and fitting mixtures of distributions is not supported "out of the box", and would require some custom models.If you just want to fit a distribution to a histogram or discrete counts, then using curve_fit or leastsq is one possibility.Josef specific subquestions:- do i need to load data in as a pandas df? an ndarray? does it not matter?- i understand endog and exog in the context of the examples given in the docs (where you have one column that you want to use to predict some other column) but not what they should be in the case where i basically am trying to fit a curve to the normalized histogram of my data- if someone can explain how to fit with statsmodels' "Negative Binomial (http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html#statsmodels.discrete.discrete_model.NegativeBinomial) that would be a good start. but i do also need to know how to fit to a sum of two of these, or possibly a sum of two other discrete distributions- is the patsy formula syntax relevant here? i have never used R and could not find an example of the "R-like" syntax that is similar enough to my use case to parse how it works- honestly i don't know what i'm doing, please help!if these questions reveal grave ignorance, or are not directly relevant enough to scipy for this mailing list, i apologize and thanks for bearing with me. i barely know how to flip a coin, this stuff is new to me.thanks a lotc _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user