Dear All,
I hope this is not too off topic. I am given a set of scatteplots (nothing too fancy; think about a normal x-y 2D plot). I do not deal with two time series (indeed I have no info about time). If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two vectors of numbers most of the case, but sometimes they can be categorical variables), I can plot one against the other and I essentially I need to determine whether A=f(B, noise) or B=g(A, noise) where the noise is the effect of other possibly unknown variables, measurement errors etc.... and f and g are two functions. Without the noise, if I want to test if A=f(B) [B causes A], then I need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different effects must have a different cause), whereas it is not ruled out that f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). However, in presence of the noise, these properties will hold only approximately so....any idea about how a statistical test, rather than eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? Any suggestion is welcome. Lorenzo _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
I'm not sure I'm understanding what you're looking for. Are you looking for *correlation* between two variables? If so, there are several statistical tests you can use: linear correlation is the most obvious, but if your variables are not linearly related you can try rank correlation tests: http://en.wikipedia.org/wiki/Rank_correlation However no statistical test will *ever* tell you if something causes something else. *Correlation does not mean causation* is a fundamental tenet of statistics -and of science in general. No matter how beautiful your plot is, it will never imply a causal relationship. 2013/4/22 Lorenzo Isella <[hidden email]> Dear All, _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Lorenzo Isella
Cross-posted to R-help:
https://stat.ethz.ch/pipermail/r-help/2013-April/352081.html Best, Michael On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella <[hidden email]> wrote: > Dear All, > I hope this is not too off topic. > I am given a set of scatteplots (nothing too fancy; think about a > normal x-y 2D plot). > I do not deal with two time series (indeed I have no info about time). > If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two > vectors of numbers most of the case, but sometimes they can be > categorical variables), I can plot one against the other and I > essentially I need to determine whether > > A=f(B, noise) or B=g(A, noise) > > where the noise is the effect of other possibly unknown variables, > measurement errors etc.... and f and g are two functions. > > Without the noise, if I want to test if A=f(B) [B causes A], then I > need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different > effects must have a different cause), whereas it is not ruled out that > f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). > > However, in presence of the noise, these properties will hold only > approximately so....any idea about how a statistical test, rather than > eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? > Any suggestion is welcome. > > > Lorenzo > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Mon, Apr 22, 2013 at 11:03 AM, R. Michael Weylandt
<[hidden email]> wrote: > Cross-posted to R-help: > https://stat.ethz.ch/pipermail/r-help/2013-April/352081.html I would guess that stats.stackexchange might be the best candidate > > Best, > Michael > > On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella > <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions. >> >> Without the noise, if I want to test if A=f(B) [B causes A], then I >> need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different >> effects must have a different cause), whereas it is not ruled out that >> f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). >> >> However, in presence of the noise, these properties will hold only >> approximately so....any idea about how a statistical test, rather than >> eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? >> Any suggestion is welcome. To me this sounds like a test for endogeneity, but you might need more structure on the noise, like additivity. A quick google search econ.msu.edu/faculty/wooldridge/docs/qmle_endog_r3.pdf seems to apply for the non-linear case. (I haven't looked at it.) I never looked at this literature, maybe White's sanity check can be used. Josef (I used the word endogeneity.) >> >> >> Lorenzo >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
> On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella You definitely need time if possible. Reference Sugihara and Munch in Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems".> <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions. Lorenzo, Jonathan _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by R. Michael Weylandt
> On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella > <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions. >Lorenzo, > >You definitely need time if possible. Reference Sugihara and Munch in >Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems". ? >Jonathan In normal terms causality needs to have time somewhere. If taking the noise out from the data could be an option to determine what you want, exploring what FastICA can do could be of help: http://www.endolith.com/wordpress/2009/11/22/a-simple-fastica-example/ FastICA comes as a function in the MDP module: http://mdp-toolkit.sourceforge.net/ Sergio PD. Not sure whether this stuff works already on python 3
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by ms-22
On Mon, Apr 22, 2013 at 04:55:53PM +0200, massimo sandal wrote:
> However no statistical test will *ever* tell you if something causes something > else. *Correlation does not mean causation* is a fundamental tenet of > statistics -and of science in general. No matter how beautiful your plot is, it > will never imply a causal relationship. No. Under certain models, one can test for causality. Some models do rely on temporality (Granger causility), but others don't. For instance there is a recent article by Aapo Hyvarinen in JMLR using the fact that, with high probability, high-entropy signals cause low-entropy signals. There is related work by Bernhard Scholpokf looking a non-Gaussianities. Anyhow, this is very much a difficult research question, and the original poster (Lorenzo) should approach it with care and do a fair amount of reading. All approaches come with their caveats and have their failure modes. Gaël _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |