# [SciPy-User] Detecting Causal Relation in a Scatterplot Classic List Threaded 7 messages Open this post in threaded view
|

## [SciPy-User] Detecting Causal Relation in a Scatterplot

 Dear All, I hope this is not too off topic. I am given a set of scatteplots (nothing too fancy; think about a normal x-y 2D plot). I do not deal with two time series (indeed I have no info about time). If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two vectors of numbers most of the case, but sometimes they can be categorical variables), I can plot one against the other and I essentially I need to determine whether A=f(B, noise) or B=g(A, noise) where the noise is the effect of other possibly unknown variables, measurement errors etc.... and f and g are two functions. Without the noise, if I want to test if A=f(B) [B causes A], then I need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different effects must have a different cause), whereas it is not ruled out that f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). However, in presence of the noise, these properties will hold only approximately so....any idea about how a statistical test, rather than eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? Any suggestion is welcome. Lorenzo _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Detecting Causal Relation in a Scatterplot

 I'm not sure I'm understanding what you're looking for. Are you looking for *correlation* between two variables? If so, there are several statistical tests you can use: linear correlation is the most obvious, but if your variables are not linearly related you can try rank correlation tests: http://en.wikipedia.org/wiki/Rank_correlation However no statistical test will *ever* tell you if something causes something else. *Correlation does not mean causation* is a fundamental tenet of statistics -and of science in general. No matter how beautiful your plot is, it will never imply a causal relationship. 2013/4/22 Lorenzo Isella Dear All, I hope this is not too off topic. I am given a set of scatteplots (nothing too fancy; think about a normal x-y 2D plot). I do not deal with two time series (indeed I have no info about time). If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two vectors of numbers most of the case, but sometimes they can be categorical variables), I can plot one against the other and I essentially I need to determine whether A=f(B, noise) or B=g(A, noise) where the noise is the effect of other possibly unknown variables, measurement errors etc.... and f and g are two functions. Without the noise, if I want to test if A=f(B) [B causes A], then I need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different effects must have a different cause), whereas it is not ruled out that f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). However, in presence of the noise, these properties will hold only approximately so....any idea about how a statistical test, rather than eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? Any suggestion is welcome. Lorenzo _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Detecting Causal Relation in a Scatterplot

 In reply to this post by Lorenzo Isella Cross-posted to R-help: https://stat.ethz.ch/pipermail/r-help/2013-April/352081.htmlBest, Michael On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella <[hidden email]> wrote: > Dear All, > I hope this is not too off topic. > I am given a set of scatteplots (nothing too fancy; think about a > normal x-y 2D plot). > I do not deal with two time series (indeed I have no info about time). > If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two > vectors of numbers most of the case, but sometimes they can be > categorical variables), I can plot one against the other and I > essentially I need to determine whether > > A=f(B, noise) or B=g(A, noise) > > where the noise is the effect of other possibly unknown variables, > measurement errors etc.... and f and g are two functions. > > Without the noise, if I want to test if A=f(B) [B causes A], then I > need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different > effects must have a different cause), whereas it is not ruled out that > f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). > > However, in presence of the noise, these properties will hold only > approximately so....any idea about how a statistical test, rather than > eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? > Any suggestion is welcome. > > > Lorenzo > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Detecting Causal Relation in a Scatterplot

 On Mon, Apr 22, 2013 at 11:03 AM, R. Michael Weylandt <[hidden email]> wrote: > Cross-posted to R-help: > https://stat.ethz.ch/pipermail/r-help/2013-April/352081.htmlI would guess that stats.stackexchange might be the best candidate > > Best, > Michael > > On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella > <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions. >> >> Without the noise, if I want to test if A=f(B) [B causes A], then I >> need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different >> effects must have a different cause), whereas it is not ruled out that >> f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect). >> >> However, in presence of the noise, these properties will hold only >> approximately so....any idea about how a statistical test, rather than >> eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)? >> Any suggestion is welcome. To me this sounds like a test for endogeneity, but you might need more structure on the noise, like additivity. A quick google search econ.msu.edu/faculty/wooldridge/docs/qmle_endog_r3.pdf seems to apply for the non-linear case. (I haven't looked at it.) I never looked at this literature, maybe White's sanity check can be used. Josef (I used the word endogeneity.) >> >> >> Lorenzo >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user> _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Detecting Causal Relation in a Scatterplot

 > On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella > <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions.Lorenzo,You definitely need time if possible.  Reference Sugihara and Munch in Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems". Jonathan _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user
 In reply to this post by R. Michael Weylandt ``` > On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella > <[hidden email]> wrote: >> Dear All, >> I hope this is not too off topic. >> I am given a set of scatteplots (nothing too fancy; think about a >> normal x-y 2D plot). >> I do not deal with two time series (indeed I have no info about time). >> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two >> vectors of numbers most of the case, but sometimes they can be >> categorical variables), I can plot one against the other and I >> essentially I need to determine whether >> >> A=f(B, noise) or B=g(A, noise) >> >> where the noise is the effect of other possibly unknown variables, >> measurement errors etc.... and f and g are two functions. >Lorenzo, > >You definitely need time if possible. Reference Sugihara and Munch in >Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems". ? >Jonathan In normal terms causality needs to have time somewhere. If taking the noise out from the data could be an option to determine what you want, exploring what FastICA can do could be of help:  http://www.endolith.com/wordpress/2009/11/22/a-simple-fastica-example/ FastICA comes as a function in the MDP module: http://mdp-toolkit.sourceforge.net/ ```SergioPD. Not sure whether this stuff works already on python 3   _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user