[SciPy-User] Detecting Causal Relation in a Scatterplot

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Detecting Causal Relation in a Scatterplot

Lorenzo Isella
Dear All,
I hope this is not too off topic.
I am given a set of scatteplots (nothing too fancy; think about a
normal x-y 2D plot).
I do not deal with two time series (indeed I have no info about time).
If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
vectors of numbers most of the case, but sometimes they can be
categorical variables), I can plot one against the other and I
essentially I need to determine whether

A=f(B, noise) or B=g(A, noise)

where the noise is the effect of other possibly unknown variables,
measurement errors etc.... and f and g are two functions.

Without the noise, if I want to test if A=f(B) [B causes A], then I
need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different
effects must have a different cause), whereas it is not ruled out that
f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect).

However, in presence of the noise, these properties will hold only
approximately so....any idea about how a statistical test, rather than
eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)?
Any suggestion is welcome.


Lorenzo
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

ms-22
I'm not sure I'm understanding what you're looking for. Are you looking for *correlation* between two variables? If so, there are several statistical tests you can use: linear correlation is the most obvious, but if your variables are not linearly related you can try rank correlation tests: http://en.wikipedia.org/wiki/Rank_correlation

However no statistical test will *ever* tell you if something causes something else. *Correlation does not mean causation* is a fundamental tenet of statistics -and of science in general. No matter how beautiful your plot is, it will never imply a causal relationship.


2013/4/22 Lorenzo Isella <[hidden email]>
Dear All,
I hope this is not too off topic.
I am given a set of scatteplots (nothing too fancy; think about a
normal x-y 2D plot).
I do not deal with two time series (indeed I have no info about time).
If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
vectors of numbers most of the case, but sometimes they can be
categorical variables), I can plot one against the other and I
essentially I need to determine whether

A=f(B, noise) or B=g(A, noise)

where the noise is the effect of other possibly unknown variables,
measurement errors etc.... and f and g are two functions.

Without the noise, if I want to test if A=f(B) [B causes A], then I
need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different
effects must have a different cause), whereas it is not ruled out that
f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect).

However, in presence of the noise, these properties will hold only
approximately so....any idea about how a statistical test, rather than
eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)?
Any suggestion is welcome.


Lorenzo
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

R. Michael Weylandt
In reply to this post by Lorenzo Isella
Cross-posted to R-help:
https://stat.ethz.ch/pipermail/r-help/2013-April/352081.html

Best,
Michael

On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella
<[hidden email]> wrote:

> Dear All,
> I hope this is not too off topic.
> I am given a set of scatteplots (nothing too fancy; think about a
> normal x-y 2D plot).
> I do not deal with two time series (indeed I have no info about time).
> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
> vectors of numbers most of the case, but sometimes they can be
> categorical variables), I can plot one against the other and I
> essentially I need to determine whether
>
> A=f(B, noise) or B=g(A, noise)
>
> where the noise is the effect of other possibly unknown variables,
> measurement errors etc.... and f and g are two functions.
>
> Without the noise, if I want to test if A=f(B) [B causes A], then I
> need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different
> effects must have a different cause), whereas it is not ruled out that
> f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect).
>
> However, in presence of the noise, these properties will hold only
> approximately so....any idea about how a statistical test, rather than
> eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)?
> Any suggestion is welcome.
>
>
> Lorenzo
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

josef.pktd
On Mon, Apr 22, 2013 at 11:03 AM, R. Michael Weylandt
<[hidden email]> wrote:
> Cross-posted to R-help:
> https://stat.ethz.ch/pipermail/r-help/2013-April/352081.html

I would guess that stats.stackexchange might be the best candidate

>
> Best,
> Michael
>
> On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella
> <[hidden email]> wrote:
>> Dear All,
>> I hope this is not too off topic.
>> I am given a set of scatteplots (nothing too fancy; think about a
>> normal x-y 2D plot).
>> I do not deal with two time series (indeed I have no info about time).
>> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
>> vectors of numbers most of the case, but sometimes they can be
>> categorical variables), I can plot one against the other and I
>> essentially I need to determine whether
>>
>> A=f(B, noise) or B=g(A, noise)
>>
>> where the noise is the effect of other possibly unknown variables,
>> measurement errors etc.... and f and g are two functions.
>>
>> Without the noise, if I want to test if A=f(B) [B causes A], then I
>> need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different
>> effects must have a different cause), whereas it is not ruled out that
>> f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect).
>>
>> However, in presence of the noise, these properties will hold only
>> approximately so....any idea about how a statistical test, rather than
>> eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)?
>> Any suggestion is welcome.

To me this sounds like a test for endogeneity, but you might need more
structure on the noise, like additivity.

A quick google search econ.msu.edu/faculty/wooldridge/docs/qmle_endog_r3.pdf
seems to apply for the non-linear case. (I haven't looked at it.)

I never looked at this literature, maybe White's sanity check can be used.

Josef
(I used the word endogeneity.)


>>
>>
>> Lorenzo
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

jkhilmer@chemistry.montana.edu
> On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella
> <[hidden email]> wrote:
>> Dear All,
>> I hope this is not too off topic.
>> I am given a set of scatteplots (nothing too fancy; think about a
>> normal x-y 2D plot).
>> I do not deal with two time series (indeed I have no info about time).
>> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
>> vectors of numbers most of the case, but sometimes they can be
>> categorical variables), I can plot one against the other and I
>> essentially I need to determine whether
>>
>> A=f(B, noise) or B=g(A, noise)
>>
>> where the noise is the effect of other possibly unknown variables,
>> measurement errors etc.... and f and g are two functions.

Lorenzo,

You definitely need time if possible.  Reference Sugihara and Munch in Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems".

Jonathan

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

Sergio Rojas
In reply to this post by R. Michael Weylandt
 
 > On Mon, Apr 22, 2013 at 3:49 PM, Lorenzo Isella 
> <[hidden email]> wrote: 
>> Dear All, 
>> I hope this is not too off topic. 
>> I am given a set of scatteplots (nothing too fancy; think about a 
>> normal x-y 2D plot). 
>> I do not deal with two time series (indeed I have no info about time). 
>> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two 
>> vectors of numbers most of the case, but sometimes they can be 
>> categorical variables), I can plot one against the other and I 
>> essentially I need to determine whether 
>> 
>> A=f(B, noise) or B=g(A, noise) 
>> 
>> where the noise is the effect of other possibly unknown variables, 
>> measurement errors etc.... and f and g are two functions. 

>Lorenzo, 
> 
>You definitely need time if possible.  Reference Sugihara and Munch in 
>Science, vol 338, 2012: "Detecting Causality in Complex Ecosystems". 
? 
>Jonathan 


In normal terms causality needs to have time somewhere. If taking the noise out 
from the data could be an option to determine what you want, exploring what 
FastICA can do could be of help: 

 http://www.endolith.com/wordpress/2009/11/22/a-simple-fastica-example/ 

FastICA comes as a function in the MDP module: 

  http://mdp-toolkit.sourceforge.net/ 

Sergio

PD. Not sure whether this stuff works already on python 3

 

 


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Detecting Causal Relation in a Scatterplot

Gael Varoquaux
In reply to this post by ms-22
On Mon, Apr 22, 2013 at 04:55:53PM +0200, massimo sandal wrote:
> However no statistical test will *ever* tell you if something causes something
> else. *Correlation does not mean causation* is a fundamental tenet of
> statistics -and of science in general. No matter how beautiful your plot is, it
> will never imply a causal relationship.

No. Under certain models, one can test for causality. Some models do rely
on temporality (Granger causility), but others don't. For instance there
is a recent article by Aapo Hyvarinen in JMLR using the fact that, with
high probability, high-entropy signals cause low-entropy signals. There
is related work by Bernhard Scholpokf looking a non-Gaussianities.

Anyhow, this is very much a difficult research question, and the original
poster (Lorenzo) should approach it with care and do a fair amount of
reading. All approaches come with their caveats and have their failure
modes.

Gaël
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user