Counterfactual : the Effect of Education and Experience on Salary

This notebook follows the example from “The Book Of Why” (Pearl, 2018) chapter 8 page 251.

Counterfactuals

In [1]:

from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os
import math
import numpy as np
import scipy.stats

In this example we are interested in the effect of experience and education on the salary of an employee, we are in possession of the following data:

Employé	EX(u)	ED(u)	$S_{0}(u)$	$S_{1}(u)$	$S_{2}(u)$
Alice	8	0	86,000	?	?
Bert	9	1	?	92,500	?
Caroline	9	2	?	?	97,000
David	8	1	?	91,000	?
Ernest	12	1	?	100,000	?
Frances	13	0	97,000	?	?
etc

$EX(u)$ : years of experience of employee $u$. [0,20]
$ED(u)$ : Level of education of employee $u$ (0:high school degree (low), 1:college degree (medium), 2:graduate degree (high)) [0,2]
$S_{i}(u)$ [65k,150k] :
- salary (observable) of employee $u$ if $i = ED(u)$,
- Potential outcome (unobservable) if $i \not = ED(u)$, salary of employee $u$ if he had a level of education of $i$.

We are left with the previous data and we want to answer the counterfactual question What would Alice’s salary be if she attended college ? (i.e. $S_{1}(Alice)$)

We create the causal diagram

In this model it is assumed that an employee’s salary is determined by his level of education and his experience. Years of experience are also affected by the level of education. Having a higher level of education means spending more time studying hence less experience.

In [2]:

edex = gum.fastBN("Ux[-2,10]->experience[0,20]<-education{low|medium|high}->salary[65,150];"
                  "experience->salary<-Us[0,25]")
edex

Out[2]:

However counterfactual queries are specific to one datapoint (in our case Alice), we need to add additional variables to our model to allow for individual variations: * Us : unobserved variables that affect salary.[0,25k] * Ux : unobserved variables that affect experience.[-2,10]

In [3]:

# no prior information about the individual (datapoint)
edex.cpt("Us").fillWith(1).normalize()
edex.cpt("Ux").fillWith(1).normalize()
# education level(supposed)
edex.cpt("education")[:] = [0.4, 0.4, 0.2]

In [4]:

# To have probabilistic results, we add a perturbation. (Gaussian around the exact values)
# we calculate a gaussian distribution
x_min = 0.0
x_max = 4.0

mean = 2.0
std = 0.65

x = np.linspace(x_min, x_max, 5)

y = scipy.stats.norm.pdf(x,mean,std)
print("We'll use the following distribution \n",y)

We'll use the following distribution
 [0.00539715 0.18794845 0.61375735 0.18794845 0.00539715]

Experience listens to Education and Ux :

\[Ex = 10 -4 \times Ed + Ux\]

In [5]:

edex.cpt("experience").fillWithFunction("10-4*education+Ux",noise=list(y))
edex.cpt("experience")

Out[5]:

		experience
education	Ux	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
low	-2	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	-1	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	0	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	1	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	2	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	3	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000
	4	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000
	5	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000
	6	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000
	7	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000
	8	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054
	9	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1889	0.6168	0.1889
	10	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0067	0.2329	0.7604
medium	-2	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	-1	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	0	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	1	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	2	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	3	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	4	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	5	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	6	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	7	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000
	8	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000
	9	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000
	10	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000
high	-2	0.7604	0.2329	0.0067	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	-1	0.1889	0.6168	0.1889	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	0	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	1	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	2	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	3	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	4	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	5	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	6	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	7	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	8	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	9	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	10	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.1879	0.6135	0.1879	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

Salary listens to Education, Experience and Us :

\[S = 65 + 2.5 \times Ex + 5 \times Ed + Us\]

In [6]:

edex.cpt("salary").fillWithFunction("round(65+2.51*experience+5*education+Us)",noise=list(y))
gnb.showInference(edex)

../_images/notebooks_65-Causality_Counterfactual_17_0.svg

To answer this counterfactual question we will follow the three steps algorithm from “The Book Of Why” (Pearl 2018) chapter 8 page 253 :

Step 1 : Abduction

Use the data to retrieve all the information that characterizes Alice

From the data we can retrieve Alice’s profile : * $Ed(Alice)$ : 0 * $Ex(Alice)$ : 8 * $S_{0}(Alice)$ : 86k

We will use Alice’s profile to get $U_s$ and $U_x$, which tell Alice apart from the rest of the data.

In [7]:

ie=gum.LazyPropagation(edex)
ie.setEvidence({'experience':8, 'education': 'low', 'salary' : "86"})
ie.makeInference()
newUs = ie.posterior("Us")
newUs

Out[7]:

Us
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25
0.1889	0.6168	0.1889	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

In [8]:

ie=gum.LazyPropagation(edex)
ie.setEvidence({'experience':8, 'education': 'low', 'salary' : "86"})
ie.makeInference()
newUx = ie.posterior("Ux")
newUx

Out[8]:

Ux
-2	-1	0	1	2	3	4	5	6	7	8	9	10
0.7604	0.2329	0.0067	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

In [9]:

gnb.showInference(edex,evs={'experience':8, 'education': "low", 'salary' : "86"},targets={'Ux','Us'})

../_images/notebooks_65-Causality_Counterfactual_23_0.svg

Step 2 & 3 : Action And Prediction

Change the model to match the hypothesis implied by the query (if she had attended university) and then use the data that characterizes Alice to calculate her salary.

We create a counterfactual world with Alice’s idiosyncratic factors, and we operate the intervention:

In [10]:

# the counterfactual world
edexCounterfactual = gum.BayesNet(edex)

In [11]:

# we replace the prior probabilities of idiosynatric factors with potentials calculated earlier
edexCounterfactual.cpt("Ux").fillWith(newUx)
edexCounterfactual.cpt("Us").fillWith(newUs)
gnb.showInference(edexCounterfactual,size="10")
print("counterfactual world created")

../_images/notebooks_65-Causality_Counterfactual_27_0.svg

counterfactual world created

In [12]:

# We operate the intervention
edexModele = csl.CausalModel(edexCounterfactual)
cslnb.showCausalImpact(edexModele,"salary",doing="education",values={"education":"medium"})

Causal Model

$$\begin{equation*}P( salary \mid \hookrightarrow\mkern-6.5mueducation) = \sum_{Us,Ux,experience}{P\left(Us\right) \cdot P\left(salary\mid Us,education,experience\right) \cdot P\left(experience\mid Ux,education\right) \cdot P\left(Ux\right)}\end{equation*}$$
Explanation : Do-calculus computations

salary
65	66	67	68	69	70	71	72	73	74	75	76	77	78	79	80	81	82	83	84	85	86	87	88	89	90	91	92	93	94	95	96	97	98	99	100	101	102	103	104	105	106	107	108	109	110	111	112	113	114	115	116	117	118	119	120	121	122	123	124	125	126	127	128	129	130	131	132	133	134	135	136	137	138	139	140	141	142	143	144	145	146	147	148	149	150
0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0002	0.0010	0.0020	0.0066	0.0342	0.0846	0.1525	0.2357	0.1307	0.0884	0.1320	0.0792	0.0354	0.0128	0.0028	0.0012	0.0006	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

Impact

Since education has no parents in our model (no graph surgery, no causes to emancipate it from), an intervention is equivalent to an observation, the only thing we need to do is to set the value of education:

In [13]:

gnb.showInference(edexCounterfactual,targets={"salary",'experience'},evs={'education':"medium"},size="10")

../_images/notebooks_65-Causality_Counterfactual_30_0.svg

The result (salary if she had attended college) is given by the formaula:

\[\sum_{salary} salary \times P(salary^* \mid RealSalary = 86k, education = 0, experience = 8, education^*=1)\]

Where variables marked with an asterisk are inobservable.

In [14]:

formula, adj, exp = csl.causalImpact(edexModele,"salary",doing="education",values={"education":"medium"})
gnb.showProba(adj)

../_images/notebooks_65-Causality_Counterfactual_32_0.svg

In [15]:

i = gum.Instantiation(adj)
i.setFirst()
mean = 0
while (not i.end()):
    v = i.val(0)
    mean = mean + (v+65)*adj.get(i)
    i.inc()
print(mean)

81.84325639929719

\[S_1(Alice) = 81k\]

Alice’s salary would be $\$81.843$ if she had attended college !

`pyAgrum.causal.counterfactual`

We can now use a function that answers counterfactual queries using the previous algorithm.

In [16]:

help(csl.counterfactual)

Help on function counterfactual in module pyAgrum.causal._causalImpact:

counterfactual(cm: pyAgrum.causal._CausalModel.CausalModel, profile: Optional[Dict[str, int]], on: Union[str, Set[str]], whatif: Union[str, Set[str]], values: Optional[Dict[str, int]] = None) -> 'pyAgrum.Potential'
    Determines the estimation of a counterfactual query following the the three steps algorithm from "The Book Of Why"
    (Pearl 2018) chapter 8 page 253.

    Determines the estimation of the counterfactual query: Given the "profile" (dictionary <variable name>:<value>),what
    would variables in "on" (single or list of variables) be if variables in "whatif" (single or list of variables) had
    been as specified in "values" (dictionary <variable name>:<value>)(optional).

    This is done according to the following algorithm:
        -Step 1-2: compute the twin causal model
        -Step 3 : determine the causal impact of the interventions specified in  "whatif" on the single or list of
        variables "on" in the causal model.

    This function returns the potential calculated in step 3, representing the probability distribution of  "on" given
    the interventions  "whatif", if it had been as specified in "values" (if "values" is omitted, every possible value of
    "whatif")

    Parameters
    ----------
    cm: CausalModel
    profile: Dict[str,int] default=None
      evidence
    on: variable name or variable names set
     the variable(s) of interest
    whatif: str|Set[str]
      idiosyncratic nodes
    values: Dict[str,int]
      values for certain variables in whatif.

    Returns
    -------
    pyAgrum.Potential
      the computed counterfactual impact

Let’s try with the previous query :

In [17]:

cm_edex= csl.CausalModel(edex)
pot=csl.counterfactual(cm =cm_edex,
                       profile = {'experience':8, 'education': "low", 'salary' : "86"},
                       whatif={"education"},
                       on={"salary"},
                       values = {"education" : "medium"})

In [18]:

gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_39_0.svg

We get the same result !

If we omit values:

We get every potential outcome :

In [19]:

pot=csl.counterfactual(cm =cm_edex,
                       profile = {'experience':8, 'education': 'low', 'salary' : '86'},
                       whatif={"education"},
                       on={"salary"})

In [20]:

gnb.showPotential(pot)

	salary
education	65	66	67	68	69	70	71	72	73	74	75	76	77	78	79	80	81	82	83	84	85	86	87	88	89	90	91	92	93	94	95	96	97	98	99	100	101	102	103	104	105	106	107	108	109	110	111	112	113	114	115	116	117	118	119	120	121	122	123	124	125	126	127	128	129	130	131	132	133	134	135	136	137	138	139	140	141	142	143	144	145	146	147	148	149	150
low	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0002	0.0010	0.0020	0.0066	0.0342	0.0846	0.1525	0.2357	0.1307	0.0884	0.1320	0.0792	0.0354	0.0128	0.0028	0.0012	0.0006	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
medium	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0002	0.0010	0.0020	0.0066	0.0342	0.0846	0.1525	0.2357	0.1307	0.0884	0.1320	0.0792	0.0354	0.0128	0.0028	0.0012	0.0006	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
high	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0006	0.0242	0.1449	0.2800	0.1580	0.1012	0.1480	0.0877	0.0375	0.0132	0.0028	0.0012	0.0006	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

What would Alice’s salary be if she had attended college and had 8 years of experience ?

In [21]:

pot=csl.counterfactual(cm =cm_edex,
                       profile = {'experience':8, 'education': 'low', 'salary' : '86'},
                       whatif={"education", "experience"},
                       on={"salary"},
                       values = {"education" : 'medium', "experience" : 8})

In [22]:

gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_47_0.svg

if she attended college and had 8 years of experience Alice’s salary would be 91k !

In the previous query, Alice’s salary if she attended college was lower than her actual salary, that’s because in the counterfactual world where she attended college she had less time to work hence her diminished salary.

In this query, Alice’s counterfactual salary was higher than her actual salary (+5k corresponding to one level of education), that’s because in the counterfactual world Alice attended college and still had time to work 8 years, so her salary went up.

if she had more experience :

of course, her salary goes up.

In [23]:

pot=csl.counterfactual(cm =cm_edex,
                       profile = {'experience':8, 'education': 'low', 'salary' : '86'},
                       whatif={"education", "experience"},
                       on={"salary"},
                       values = {"education" : 'medium', "experience" : 10})
gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_50_0.svg

In [24]:

twin=csl.counterfactualModel(cm = csl.CausalModel(edex),
                       profile = {'experience':8, 'education': 'low', 'salary' : '86'},
                       whatif={"experience"})
gnb.showInference(twin.observationalBN(),size="10")

../_images/notebooks_65-Causality_Counterfactual_51_0.svg

In [25]:

edexModeleWithout = csl.CausalModel(edex) #(<latent variable name>, <list of affected variables’ ids>).
edexModeleWithout

Out[25]:

Let’s try with the previous queries :

In [26]:

pot = csl.counterfactual(cm = edexModeleWithout,
                         profile = {'experience':8, 'education': "low", 'salary' : "86"},
                         whatif={"education"},
                         on={"salary"},
                         values = {"education" : "medium"})
gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_54_0.svg

In [27]:

pot=csl.counterfactual(cm = edexModeleWithout,
                       profile = {'experience':8, 'education': 'low', 'salary' : '86'},
                       whatif={"education", "experience"},
                       on={"salary"},
                       values = {"education" : 'medium', "experience" : 8})
gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_55_0.svg

We get the same results.

Latent variable between $U_x$ and $experience$ :

In [28]:

edexModeleWithOne = csl.CausalModel(edex,[("u1", ["Ux","experience"])],False) #(<latent variable name>, <list of affected variables’ ids>).
edexModeleWithOne

Out[28]:

In [29]:

pot = csl.counterfactual(cm = edexModeleWithOne,
                         profile = {'experience':8, 'education': "low", 'salary' : "86"},
                         whatif={"education"},
                         on={"salary"},
                         values = {"education" : "medium"})
gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_59_0.svg

With one latent variable between $U_x$ and $experience$, we get $96k corresponding to one education level (we don’t need to worry about experience any more.)

In [30]:

pot = csl.counterfactual(cm = edexModeleWithOne,
                         profile = {'experience':8, 'education': "low", 'salary' : "86"},
                         whatif={"education"},
                         on={"salary"},
                         values = {"education" : "high"})
gnb.showProba(pot)

../_images/notebooks_65-Causality_Counterfactual_61_0.svg

In [ ]:

Counterfactual : the Effect of Education and Experience on Salary

Counterfactuals

We create the causal diagram

Step 1 : Abduction

Step 2 & 3 : Action And Prediction

Alice’s salary would be \(\$81.843\) if she had attended college !

`pyAgrum.causal.counterfactual`

Let’s try with the previous query :

If we omit values:

What would Alice’s salary be if she had attended college and had 8 years of experience ?

if she attended college and had 8 years of experience Alice’s salary would be 91k !

if she had more experience :

Let’s try with the previous queries :

Latent variable between \(U_x\) and \(experience\) :

Counterfactual : the Effect of Education and Experience on Salary

Counterfactuals

We create the causal diagram

Step 1 : Abduction

Step 2 & 3 : Action And Prediction

Alice’s salary would be \(\$81.843\) if she had attended college !

pyAgrum.causal.counterfactual

Let’s try with the previous query :

If we omit values:

What would Alice’s salary be if she had attended college and had 8 years of experience ?

if she attended college and had 8 years of experience Alice’s salary would be 91k !

if she had more experience :

Let’s try with the previous queries :

Latent variable between \(U_x\) and \(experience\) :

`pyAgrum.causal.counterfactual`