this notebook shows a model for a multinomial Simpson paradox.

In [1]:

import matplotlib.pyplot as plt
import random

import pandas as pd
import numpy as np

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb

import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb


## Building the models

In [2]:

# building a model including a Simpson's paradox
import scipy.stats as stats

bn=gum.fastBN(f"A[0,99]->B[0:40:200]<-C[0,5]->A")

bn.cpt("C").fillFromDistribution(stats.uniform,loc=0,scale=5)
bn.cpt("A").fillFromDistribution(stats.uniform,loc="C*12",scale=30)
bn.cpt("B").fillFromDistribution(stats.norm,loc="5+C*4-int(A/8)",scale=2);

In [3]:

#  generating a CSV, taking this model as the causal one.
gum.generateSample(bn,400,"out/sample.csv",with_labels=False)
df.plot.scatter(x='A', y='B', c='C',colormap="tab20");

In [4]:

cm=csl.CausalModel(bn)
_,p,_=csl.causalImpact(cm,on="B",doing="A")

In [5]:

# building an Markov-equivalent model, generating a CSV, taking this model as the causal one.
bn2=gum.BayesNet(bn)
bn2.reverseArc("C","A")

gum.generateSample(bn2,400,"out/sample2.csv",with_labels=False)

cm2=csl.CausalModel(bn2)
_,p2,_=csl.causalImpact(cm2,on="B",doing="A")


## The observationnal model and its paradoxal structure (exactly the same with the second Markov-equivalent model)

In [6]:

gnb.flow.row(gnb.getBN(bn),
df.plot.scatter(x='A', y='B'),
df.plot.scatter(x='A', y='B', c='C',colormap="tab20"),
captions=["the observationnal model","the trend is increasing","the trend is decreasing for any value for C !"])
gnb.flow.row(gnb.getBN(bn2),
df2.plot.scatter(x='A', y='B'),
df2.plot.scatter(x='A', y='B', c='C',colormap="tab20"),
captions=["the Markov-equivalent model","the trend is increasing","the trend is decreasing for any value for C !"])


the observationnal model

the trend is increasing

the trend is decreasing for any value for C !

the Markov-equivalent model

the trend is increasing

the trend is decreasing for any value for C !

## The paradox is revealed in the trend of the inferred means : the means are increasing with the value of $$A$$except for any value of :math:C …

In [7]:

gum.config["notebook","histogram_epsilon"]=0.001
gum.config['notebook', 'histogram_discretized_scale']=0.4

In [8]:

for a in [10,20,30]:
gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":a}),f"$P(B|A={a})$")
gnb.flow.new_line()
for a in [10,20,30]:
gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":a,"C":0}),f"P(B | $A={a},C=0)$")
gnb.flow.new_line()
for a in [10,20,30]:
gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":a,"C":2}),f"P(B | $A={a},C=2$)")
gnb.flow.new_line()
for a in [10,20,30]:
gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":a,"C":4}),f"P(B | $A={a},C=4$)")
gnb.flow.display()


$P(B|A=10)$

$P(B|A=20)$

$P(B|A=30)$

P(B | $A=10,C=0)$

P(B | $A=20,C=0)$

P(B | $A=30,C=0)$

P(B | $A=10,C=2$)

P(B | $A=20,C=2$)

P(B | $A=30,C=2$)

P(B | $A=10,C=4$)

P(B | $A=20,C=4$)

P(B | $A=30,C=4$)

## Now that the paradoxal structure is understood and the paradox is revealed, will we choose to observe $$C$$ (or not) before deciding to increase or decrease $$A$$ (with the goal to maximize $$B$$) ?

Of course, it depends on the causal structure of the problem !

In [9]:

gnb.flow.add_html(cslnb.getCausalModel(cm),"the first causal model")
for v in [10,20,30]:
gnb.flow.add_html(gnb.getProba(p.extract({'A':v})),f"Doing $A={v}$")
gnb.flow.display()


the first causal model

Doing $A=10$

Doing $A=20$

Doing $A=30$

## If $$C$$ is cause for $$A$$, observing $$C$$ really gives a new information about $$B$$.

In [10]:

gnb.flow.add_html(cslnb.getCausalModel(cm2),"the second causal model")
for v in [10,20,30]:
gnb.flow.add_html(gnb.getProba(p2.extract({'A':v})),f"Doing $A={v}$")
gnb.flow.display()


the second causal model

Doing $A=10$

Doing $A=20$

Doing $A=30$

## if $$A$$ is cause for $$C$$, observing $$C$$ may lead to misinterpretations about the causal role of $$A$$.

In [ ]:



In [ ]: