Causal Effect Estimation

This notebook serves as a demonstration of the CausalEffectEstimation module for estimating causal effects on both generated and real datasets.

Creative Commons License

aGrUM

interactive online version

In [1]:
import pyAgrum as gum
import pyAgrum.lib.discretizer as disc
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.explain as gexpl

import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

1. ACE estimations from generated RCT data

To estimate the Average Causal Effects (ACE) from data generated under conditions resembling a Randomized Controlled Trial (RCT), we define two generative models.

1.1 Dataset

Consider two generative models:

  • A linear generative model described by the equation:

    \[Y = 3X_1 + 2X_2 -2X_3 -0.8X_4 + T(2X_1 + 5X_3 +3X_4)\]
  • And a non-linear generative model described by the equation:

    \[Y = 3X_1 + 2X_2^2 -2X_3 -0.8X_4 +10T\]

Where $ (X_1,X_2,X_3,X_4) \sim `:nbsphinx-math:mathcal{N}`_4((1,1,1,1), I_4) $, \(T \sim \mathcal{Ber}(1/2)\) and $ (X_1,X_2,X_3,X_4,T) $ are jointly independent in both of the models.

Data from the models can be obtatined by the functions given below.

In [2]:
def linear_simulation(n : int = 100000, sigma : float = 1) -> pd.DataFrame:
  """
  Returns n observations from the linear model with normally distributed
  noise with expected value 0 and standard deviation sigma.
  """

  X1 = np.random.normal(1, 1, n)
  X2 = np.random.normal(1, 1, n)
  X3 = np.random.normal(1, 1, n)
  X4 = np.random.normal(1, 1, n)
  epsilon = np.random.normal(0, sigma, n)
  T=np.random.binomial(1, 0.5, n)
  Y= 3*X1+2*X2-2*X3-0.8*X4+T*(2*X1+5*X3+3*X4)+epsilon
  d=np.array([T,X1,X2,X3,X4,Y])
  df_data = pd.DataFrame(data=d.T,columns=['T','X1','X2','X3','X4','Y'])
  df_data["T"] = df_data["T"].astype(int)

  return df_data

def non_linear_simulation(n : int = 100000, sigma : float = 1) -> pd.DataFrame:
  """
  Returns n observations from the non-linear model with normally distributed
  noise with expected value 0 and standard deviation sigma.
  """

  X1 = np.random.normal(1, 1, n)
  X2 = np.random.normal(1, 1, n)
  X3 = np.random.normal(1, 1, n)
  X4 = np.random.normal(1, 1, n)
  epsilon = np.random.normal(0, sigma, n)
  T=np.random.binomial(1, 0.5, n)
  Y= 3*X1+ 2*X2**2-2*X3-0.8*X4+10*T+epsilon
  d=np.array([T,X1,X2,X3,X4,Y])
  df_data = pd.DataFrame(data=d.T,columns=['T','X1','X2','X3','X4','Y'])
  df_data["T"] = df_data["T"].astype(int)

  return df_data

Furthermore, the expected values of \(Y(0)\) and \(Y(1)\) can be explicitly calculated, providing us the theoretical ACE of \(\tau = 10\) which will serve as a point of reference for the estimations.

We will explore how the CausalEffectEstimation module can estimate the causal effect of \(T\) on \(Y\) in both of the generated datasets.

1.2 Setup

First, genereate the data using the previously defined functions, and specify the causal graph of the variables. A single graph will be applicable to both datasets, as they share the same variables and causal structure.

In [3]:
linear_data = linear_simulation()
non_linear_data = non_linear_simulation()

bn = gum.fastBN("Y<-T; Y<-X1; Y<-X2; Y<-X3; Y<-X4")
causal_model = csl.CausalModel(bn)

cslnb.showCausalModel(causal_model, size="10")
../_images/notebooks_66-Causality_CausalEffectEstimation_12_0.svg

If the CausalBNEstimator estimation method is not employed for estimation process, the model generated using gum.fastBN will suffice, as only the graph structure of the Causal Bayesian Network will be used for causal identification.

However, if the Causal Bayesian Network estimator is utilized, it will be necessary to provide a csl.CausalModel object with appropriate discretization, as the Conditional Probability Tables of the model will be used for estimation. Here we use the discretizer module to perform this task, the arcs are added manually.

Selecting an optimal discretization is crucial: a coarse discretization may lead to poor estimation due to its inability to capture fine variations in the distribution, while an overly fine discretization may result in too many parameters, making it difficult for the parameter learning algorithm to accurately estimate the distribution. Therefore, the discretization should strike a balance, being neither too coarse nor too fine.

In [4]:
discretizer = disc.Discretizer(
    defaultDiscretizationMethod="uniform",
    defaultNumberOfBins=20
)
disc_bn = discretizer.discretizedTemplate(linear_data)

disc_bn.beginTopologyTransformation()
for node in {"T", "X1", "X2", "X3", "X4"}:
    disc_bn.addArc(node, "Y")
disc_bn.endTopologyTransformation()

disc_causal_model = csl.CausalModel(disc_bn)

for id, _ in disc_bn:
    print(disc_bn.variable(id))
T:Range([0,1])
X1:Discretized(<(-3.14718;-2.72853[,[-2.72853;-2.30989[,[-2.30989;-1.89124[,[-1.89124;-1.4726[,[-1.4726;-1.05395[,[-1.05395;-0.635307[,[-0.635307;-0.216662[,[-0.216662;0.201983[,[0.201983;0.620628[,[0.620628;1.03927[,[1.03927;1.45792[,[1.45792;1.87656[,[1.87656;2.29521[,[2.29521;2.71385[,[2.71385;3.1325[,[3.1325;3.55114[,[3.55114;3.96979[,[3.96979;4.38843[,[4.38843;4.80708[,[4.80708;5.22572)>)
X2:Discretized(<(-3.81577;-3.37564[,[-3.37564;-2.9355[,[-2.9355;-2.49537[,[-2.49537;-2.05523[,[-2.05523;-1.6151[,[-1.6151;-1.17496[,[-1.17496;-0.734826[,[-0.734826;-0.294691[,[-0.294691;0.145445[,[0.145445;0.58558[,[0.58558;1.02572[,[1.02572;1.46585[,[1.46585;1.90599[,[1.90599;2.34612[,[2.34612;2.78626[,[2.78626;3.22639[,[3.22639;3.66653[,[3.66653;4.10666[,[4.10666;4.5468[,[4.5468;4.98693)>)
X3:Discretized(<(-3.60739;-3.14829[,[-3.14829;-2.68918[,[-2.68918;-2.23008[,[-2.23008;-1.77098[,[-1.77098;-1.31187[,[-1.31187;-0.852771[,[-0.852771;-0.393668[,[-0.393668;0.0654355[,[0.0654355;0.524539[,[0.524539;0.983642[,[0.983642;1.44275[,[1.44275;1.90185[,[1.90185;2.36095[,[2.36095;2.82006[,[2.82006;3.27916[,[3.27916;3.73826[,[3.73826;4.19737[,[4.19737;4.65647[,[4.65647;5.11557[,[5.11557;5.57468)>)
X4:Discretized(<(-3.05233;-2.62422[,[-2.62422;-2.1961[,[-2.1961;-1.76799[,[-1.76799;-1.33987[,[-1.33987;-0.91176[,[-0.91176;-0.483645[,[-0.483645;-0.0555308[,[-0.0555308;0.372584[,[0.372584;0.800698[,[0.800698;1.22881[,[1.22881;1.65693[,[1.65693;2.08504[,[2.08504;2.51316[,[2.51316;2.94127[,[2.94127;3.36939[,[3.36939;3.7975[,[3.7975;4.22562[,[4.22562;4.65373[,[4.65373;5.08184[,[5.08184;5.50996)>)
Y:Discretized(<(-15.324;-12.5206[,[-12.5206;-9.71726[,[-9.71726;-6.91389[,[-6.91389;-4.11051[,[-4.11051;-1.30714[,[-1.30714;1.49624[,[1.49624;4.29961[,[4.29961;7.10299[,[7.10299;9.90637[,[9.90637;12.7097[,[12.7097;15.5131[,[15.5131;18.3165[,[18.3165;21.1199[,[21.1199;23.9232[,[23.9232;26.7266[,[26.7266;29.53[,[29.53;32.3334[,[32.3334;35.1367[,[35.1367;37.9401[,[37.9401;40.7435)>)

We are now prepared to instantiate the CausalEffectEstimation object for both datasets.

In [5]:
linear_cee = csl.CausalEffectEstimation(linear_data, disc_causal_model)
print(linear_cee)
<pyAgrum.causal.causalEffectEstimation._CausalEffectEstimation.CausalEffectEstimation object at 0x1a91b6a8a50>

 Dataframe      : <pandas.core.frame.DataFrame object at 0x1a9191eab10>
        - shape         : (100000, 6)
        - columns       : Index(['T', 'X1', 'X2', 'X3', 'X4', 'Y'], dtype='object')
        - memory usage  : 4.800132 MB
 Causal Model   : <pyAgrum.causal._CausalModel.CausalModel object at 0x000001A91B694090>
        - names         : {0: 'T', 1: 'X1', 2: 'X2', 3: 'X3', 4: 'X4', 5: 'Y'}
        - causal BN     : BN{nodes: 6, arcs: 5, domainSize: 10^6.80618, dim: 6080077, mem: 48Mo 848Ko 656o}
        - observ. BN    : BN{nodes: 6, arcs: 5, domainSize: 10^6.80618, dim: 6080077, mem: 48Mo 848Ko 656o}
In [6]:
non_linear_cee = csl.CausalEffectEstimation(non_linear_data, disc_causal_model)
print(non_linear_cee)
<pyAgrum.causal.causalEffectEstimation._CausalEffectEstimation.CausalEffectEstimation object at 0x1a91b6cc250>

 Dataframe      : <pandas.core.frame.DataFrame object at 0x1a918810590>
        - shape         : (100000, 6)
        - columns       : Index(['T', 'X1', 'X2', 'X3', 'X4', 'Y'], dtype='object')
        - memory usage  : 4.800132 MB
 Causal Model   : <pyAgrum.causal._CausalModel.CausalModel object at 0x000001A91B694090>
        - names         : {0: 'T', 1: 'X1', 2: 'X2', 3: 'X3', 4: 'X4', 5: 'Y'}
        - causal BN     : BN{nodes: 6, arcs: 5, domainSize: 10^6.80618, dim: 6080077, mem: 48Mo 848Ko 656o}
        - observ. BN    : BN{nodes: 6, arcs: 5, domainSize: 10^6.80618, dim: 6080077, mem: 48Mo 848Ko 656o}

1.3 Causal Indentification

The subsequent step involves identifying the causal criterion to be used for estimation. This is crucial, as most estimators rely on strong assumptions regarding the underlying causal structure of the data-generating process. Incorrect specification of the adjustment may compromise the guarantee of asymptotic normality.

In [7]:
linear_cee.identifyAdjustmentSet(intervention="T", outcome="Y")
non_linear_cee.identifyAdjustmentSet(intervention="T", outcome="Y", verbose=False)
Randomized Controlled Trial adjustment found.

Supported estimators include:
- CausalModelEstimator
- DM
If the outcome variable is a cause of other covariates in the causal graph,
Backdoor estimators may also be used.
Out[7]:
'Randomized Controlled Trial'

Consistent with the data generation, the adjustment identified is the Randomized Control Trial adjustment. This yields a list of the various estimators supported by this adjustment.

1.4 Causal Effect Estimation

Once the adjustment is identified, we can proceed to estimation using the supported estimators. First, the estimator must be fitted to the data.

In [8]:
linear_cee.fitDM()
non_linear_cee.fitDM()

The estimation is obtained by calling the estimateCausalEffect method.

In [9]:
linear_tau_hat = linear_cee.estimateCausalEffect()
non_linear_tau_hat = non_linear_cee.estimateCausalEffect()

print(f"ACE linear = {linear_tau_hat}, MAPE = {abs((linear_tau_hat-10)*10)} %")
print(f"ACE non linear = {non_linear_tau_hat}, MAPE = {abs((non_linear_tau_hat-10)*10)} %")
ACE linear = 10.022196692897303, MAPE = 0.22196692897303194 %
ACE non linear = 10.047761608442126, MAPE = 0.477616084421264 %

The difference-in-means estimator, which is the simplest estimator for the ACE, yields mostly accurate results. This is expected in an RCT environment where the treatment is independent of confounders, making intervention equivalent to observation.

1.5 User specified adjustment

If the user wish to apply an alternative adjustment, they may specify their own set of variables for each component of the adjustment. However, please note that such custom adjustments do not guarantee unbiased estimation and may not ensure an error-free estimation process.

In [10]:
linear_cee.useBackdoorAdjustment(intervention="T", outcome="Y", confounders={"X1", "X2", "X3", "X4"})
non_linear_cee.useBackdoorAdjustment(intervention="T", outcome="Y", confounders={"X1", "X2", "X3", "X4"})
In [11]:
linear_cee.fitSLearner()
non_linear_cee.fitSLearner()
In [12]:
linear_tau_hat = linear_cee.estimateCausalEffect()
non_linear_tau_hat = non_linear_cee.estimateCausalEffect()

print(f"ACE linear = {linear_tau_hat}, MAPE = {abs((linear_tau_hat-10)*10)} %")
print(f"ACE non linear = {non_linear_tau_hat}, MAPE = {abs((non_linear_tau_hat-10)*10)} %")
ACE linear = 9.993396025658598, MAPE = 0.06603974341402363 %
ACE non linear = 10.010557707228799, MAPE = 0.10557707228798563 %

In this case, RCT adjustment also supports Backdoor adjustment, we thus get mostly accurate estimations. Let’s see how the estimation would be if we specify an uncompatible adjustment.

1.6 Incorrectly specified adjustment

We will use the frontdoor adjustment to illustrate the behaviours of incorrect specification.

In [13]:
linear_cee.useFrontdoorAdjustment(intervention="T", outcome="Y", mediators={"X1", "X2", "X3", "X4"})
non_linear_cee.useFrontdoorAdjustment(intervention="T", outcome="Y", mediators={"X1", "X2", "X3", "X4"})
In [14]:
linear_cee.fitSimplePlugIn()
non_linear_cee.fitSimplePlugIn()
In [15]:
linear_tau_hat = linear_cee.estimateCausalEffect()
non_linear_tau_hat = non_linear_cee.estimateCausalEffect()

print(f"ACE linear = {linear_tau_hat}, MAPE = {abs((linear_tau_hat-10)*10)} %")
print(f"ACE non linear = {non_linear_tau_hat}, MAPE = {abs((non_linear_tau_hat-10)*10)} %")
ACE linear = 0.027356079062413947, MAPE = 99.72643920937585 %
ACE non linear = 0.03652571470181629, MAPE = 99.63474285298183 %

As anticipated, using an incorrect adjustment set results in a heavily biased estimation. In this case, the ACE is close to zero, indicating that the estimator incorrectly predicts no causal effect of \(T\) on \(Y\).

1.7 Causal Bayesian Network Estimation

To fully utilize the causal model within the estimation process, we will now use the Conditional Probability Tables of the Causal Bayesian Network via the CausalBNEstimator, rather than relying solely on the underlying causal graph. The procedure will follow the same methodology as previously applied.

In [16]:
linear_cee.identifyAdjustmentSet(intervention="T", outcome="Y", verbose=False)
non_linear_cee.identifyAdjustmentSet(intervention="T", outcome="Y", verbose=False)
Out[16]:
'Randomized Controlled Trial'
In [17]:
linear_cee.fitCausalBNEstimator()
non_linear_cee.fitCausalBNEstimator()
In [18]:
linear_tau_hat = linear_cee.estimateCausalEffect()
non_linear_tau_hat = non_linear_cee.estimateCausalEffect()

print(f"ACE linear = {linear_tau_hat}, MAPE = {abs((linear_tau_hat-10)*10)} %")
print(f"ACE non linear = {non_linear_tau_hat}, MAPE = {abs((non_linear_tau_hat-10)*10)} %")
ACE linear = 9.04886401302389, MAPE = 9.511359869761105 %
ACE non linear = 9.023563116890791, MAPE = 9.764368831092085 %

2. ACE estimations from real RCT data

In this section, we will estimate the ACE under real-world RCT conditions.

2.1 Dataset

The data used in this notbook come from the Tennessee Student/Teacher Achievement Ratio (STAR) trial. This randomized controlled trial was designed to assess the effects of smaller class sizes in primary schools (T) on students’ academic performance (Y).

The covariates in this study include:

  • gender

  • age

  • g1freelunch being the number of lunchs provided to the child per day

  • g1surban the localisation of the school (inner city or rural)

  • ethnicity

In [19]:
# Preprocessing

# Load data - read everything as a string and then cast
star_df = pd.read_csv("./res/STAR_data.csv", sep=",", dtype=str)
star_df = star_df.rename(columns={"race": "ethnicity"})

# Fill na
star_df = star_df.fillna({"g1freelunch": 0, "g1surban": 0})
drop_star_l = ["g1tlistss", "g1treadss", "g1tmathss", "g1classtype",
"birthyear", "birthmonth", "birthday", "gender",
"ethnicity", "g1freelunch", "g1surban"]
star_df = star_df.dropna(subset=drop_star_l, how='any')

# Cast value types before processing
star_df["gender"] = star_df["gender"].astype(int)
star_df["ethnicity"] = star_df["ethnicity"].astype(int)

star_df["g1freelunch"] = star_df["g1freelunch"].astype(int)
star_df["g1surban"] = star_df["g1surban"].astype(int)
star_df["g1classtype"] = star_df["g1classtype"].astype(int)

# Keep only class type 1 and 2 (in the initial trial,
# 3 class types where attributed and the third one was big classes
# but with a teaching assistant)
star_df = star_df[~(star_df["g1classtype"] == 3)].reset_index(drop=True)

# Compute the outcome
star_df["Y"] = (star_df["g1tlistss"].astype(int) +
                star_df["g1treadss"].astype(int) +
                star_df["g1tmathss"].astype(int)) / 3

# Compute the treatment
star_df["T"] = star_df["g1classtype"].apply(lambda x: 0 if x == 2 \
                                                        else 1)

# Transform date to obtain age (Notice: if na --> date is NaT)
star_df["date"] = pd.to_datetime(star_df["birthyear"] + "/"
+ star_df["birthmonth"] + "/"
+ star_df["birthday"], yearfirst=True, errors="coerce")
star_df["age"] = (np.datetime64("1985-01-01") - star_df["date"])
star_df["age"] = star_df["age"].dt.days / 365.25

# Keep only covariates we consider predictive of the outcome
star_covariates_l = ["gender", "ethnicity", "age",
                     "g1freelunch", "g1surban"]
star_df = star_df[["Y", "T"] + star_covariates_l]

# Map numerical to categorical
star_df["gender"] = star_df["gender"].apply(lambda x: "Girl" if x == 2 \
                                            else "Boy").astype("category")
star_df["ethnicity"] = star_df["ethnicity"].map( \
    {1:"White", 2:"Black", 3:"Asian",
     4:"Hispanic",5:"Nat_American", 6:"Other"}).astype("category")
star_df["g1surban"] = star_df["g1surban"].map( \
    {1:"Inner_city", 2:"Suburban",
     3:"Rural", 4:"Urban"}).astype("category")

star_df.describe()
Out[19]:
Y T age g1freelunch
count 4215.000000 4215.000000 4215.000000 4215.000000
mean 540.095848 0.428233 4.879872 1.471886
std 39.267221 0.494881 0.465104 0.534171
min 439.333333 0.000000 3.129363 0.000000
25% 511.333333 0.000000 4.525667 1.000000
50% 537.333333 0.000000 4.818617 1.000000
75% 566.000000 1.000000 5.111567 2.000000
max 670.666667 1.000000 7.225188 2.000000

It appears that there are more units in the control group. However, the control and treatment groups appear to be similar in distribution, indicating that the ignorability assumption is likely satisfied.

We will explore how the CausalEffectEstimation module can estimate the causal effect of \(T\) on \(Y\) in both of the given datasets.

2.2 Structure Learning and Setup

In the absence of a predefined causal structure, structure learning is utilized to uncover the underlying relationships between the variables in the dataset. To facilitate this process, a slice order will be imposed on the variables. This approach will serve as the foundation for deriving the necessary causal structure for subsequent analysis.

To enable the application of structure learning algorithms, the variables will first be discretized using the discretizer module. Following this, the causal structure will be derived using gum.BNLearner.

In [20]:
discretizer = disc.Discretizer(defaultDiscretizationMethod='uniform')
discretizer.setDiscretizationParameters("age", 'uniform', 24)
discretizer.setDiscretizationParameters("Y", 'uniform', 30)

template = discretizer.discretizedTemplate(star_df)

learner = gum.BNLearner(star_df, template)
learner.useNMLCorrection()
learner.useSmoothingPrior(1e-6)
learner.setSliceOrder([["T", "ethnicity", "gender", "age"],
                       ["g1surban", "g1freelunch", ], ["Y"]])
bn = learner.learnBN()

print(learner)

gnb.sideBySide(gexpl.getInformation(bn, size="50"),
               gnb.getInference(bn, size="50"))
Filename               : C:\Users\phw\AppData\Local\Temp\tmpu9hb6hu9.csv
Size                   : (4215,7)
Variables              : Y[30], T[2], gender[2], ethnicity[6], age[24], g1freelunch[3], g1surban[4]
Induced types          : False
Missing values         : False
Algorithm              : MIIC
Correction             : NML
Prior                  : Smoothing
Prior weight           : 0.000001
Constraint Slice Order : {ethnicity:0, T:0, g1surban:1, age:0, gender:0, g1freelunch:1, Y:2}

G g1surban g1surban g1freelunch g1freelunch g1surban->g1freelunch Y Y g1freelunch->Y ethnicity ethnicity ethnicity->g1surban ethnicity->g1freelunch age age T T T->Y gender gender
structs Inference in   0.00ms Y 2024-09-22T19:11:01.276972 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ T 2024-09-22T19:11:01.350173 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ T->Y gender 2024-09-22T19:11:01.401592 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ ethnicity 2024-09-22T19:11:01.461710 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ g1freelunch 2024-09-22T19:11:01.622395 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ ethnicity->g1freelunch g1surban 2024-09-22T19:11:01.684212 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ ethnicity->g1surban age 2024-09-22T19:11:01.550061 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ g1freelunch->Y g1surban->g1freelunch

This initial approach appears promising, as the inferred causal relationships are somewhat consistent with what might be expected from an non-expert perspective.

Now given the causal structure, we are set to instanciate the CausalEffectEstimation class to perform estimation.

In [21]:
causal_model = csl.CausalModel(bn)
cee = csl.CausalEffectEstimation(star_df, causal_model)

2.3 Causal Identification

The next step involves formal causal identification. As expected, we identify the RCT adjustment, consistent with the experimental design.

In [22]:
cee.identifyAdjustmentSet(intervention="T", outcome="Y")
Randomized Controlled Trial adjustment found.

Supported estimators include:
- CausalModelEstimator
- DM
If the outcome variable is a cause of other covariates in the causal graph,
Backdoor estimators may also be used.
Out[22]:
'Randomized Controlled Trial'

2.4 Causal Effect Estimation

Once the ajustment identified, we can use the appropiate estimators for estimation.

In [23]:
cee.fitDM()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 12.814738911047016
In [24]:
cee.fitCausalBNEstimator()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 11.515235748777876

Let’s evaluate how the backdoor adjustment estimators compare to the previously obtained estimates. In this analysis, we control for the g1freelunch variable.

In [25]:
cee.useBackdoorAdjustment(intervention="T", outcome="Y", confounders={"g1freelunch"})
In [26]:
cee.fitSLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 11.616979201549725
In [27]:
cee.fitTLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 11.616705516924473
In [28]:
cee.fitIPW()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 11.77382443916551
In [29]:
cee.fitPStratification()
tau_hat = cee.estimateCausalEffect()

print(f"ACE = {tau_hat}")
ACE = 10.920977442188093

3. ACE estimations from generated observational data

Next, we will estimate the ACE using generated data that does not adhere to RCT conditions. The generative model follows the specification from Lunceford & Davidian (2004)

3.1 Dataset

Consider the following generative model:

  • The outcome variable \(Y\) is generated according the following equation:

    \[\begin{split} \begin{align*} Y & = - X_1 + X_2 - X_3 +2 T -V_1 + V_2 + V_3 \\ & = \langle \nu , \boldsymbol{Z} \rangle + \langle \xi, \boldsymbol{V} \rangle \end{align*}\end{split}\]

Where \(\nu = (0, -1, 1, -1, 2)^\intercal\), \(\boldsymbol{Z} = (1, X_1, X_2, X_3, T)^\intercal\), \(\xi = (-1,1,1)^\intercal\) and \(\boldsymbol{V} = (V_1, V_2, V_3)^\intercal\).

  • The covariates are distributed as \(X_3 \sim \text{Bernoulli}\left(0.2\right)\). Conditionally \(X_3\), the distribution of the other variables is defined as:

If \(X_{3} = 0\), \(V_3 \sim \text{Bernoulli}\left(0.25\right)\) and \({\left(X_{1}, V_{1}, X_{2}, V_{2}\right)}^{\intercal} \sim \mathcal{N}_4({\tau}_{0}, \Sigma)\)

If $ X_{3} = 1$, \(V_3 \sim \text{Bernoulli}\left(0.75\right)\) and \({\left(X_{1}, V_{1}, X_{2}, V_{2}\right)}^{\intercal} \sim \mathcal{N}_4({\tau}_{1}, \Sigma)\) with

\[\begin{split} {\tau}_{1} = \left(\begin{array}{c} 1 \\ 1 \\ -1\\ -1 \end{array}\right), {\tau}_{0} = \left(\begin{array}{c} -1 \\ -1 \\ 1\\ 1 \end{array}\right) \ \text{and} \ \Sigma = \left(\begin{array}{cccc} 1 & 0.5 & -0.5 & -0.5\\ 0.5 & 1 & -0.5 & -0.5 \\ -0.5 & -0.5 & 1 & 0.5 \\ -0.5 & -0.5 & 0.5 & 1 \end{array} \right)\end{split}\]
  • The treatment \(T\) is generated as a Bernoulli of the propensity score:

    \[\begin{split} \begin{align*} \mathbb{P}[T=1|X] &= e\left(X, \beta\right) \\ &= (1+\exp (-0.6 X_{1} +0.6 X_{2} - 0.6 X_{3}))^{-1} \\ &= \frac{1}{1+e^{-\langle \beta , \boldsymbol{X} \rangle}} \\ \mathbb{P}[T=0|X] &= 1-\mathbb{P}[T=1|X] \end{align*}\end{split}\]

    With \(\beta = {\left(0, 0.6, -0.6, 0.6\right)}^{\intercal}\) and \(\boldsymbol{X} = (1, X_1, X_2, X_3)^{\intercal}\).

In [30]:
# Model parameters
XI = np.array([-1, 1, 1])
NU = np.array([0, -1, 1, -1, 2])
BETA = np.array([0, 0.6, -0.6, 0.6])
TAU_0 = np.array([-1, -1, 1, 1])
TAU_1 = TAU_0 * -1
SIGMA = np.array([[1, 0.5, -0.5, -0.5],
                  [0.5, 1, -0.5, -0.5],
                  [-0.5, -0.5, 1, 0.5],
                  [-0.5, -0.5, 0.5, 1]], dtype=float)

def generate_lunceford(n=100000):
    # Generate data
    x3 = np.random.binomial(1, 0.2, n)
    v3 = np.random.binomial(1, (0.75 * x3 + (0.25 * (1 - x3))), n)

    # If x3=0 you have a model, if x3=1 you have another one
    x1v1x2v2_x3_0_matrix = np.random.multivariate_normal(TAU_0, SIGMA, size=n, check_valid='warn', tol=1e-8)
    x1v1x2v2_x3_1_matrix = np.random.multivariate_normal(TAU_1, SIGMA, size=n, check_valid='warn', tol=1e-8)
    x1v1x2v2_x3 = np.where(np.repeat(x3[:, np.newaxis], 4, axis=1) == 0, x1v1x2v2_x3_0_matrix, x1v1x2v2_x3_1_matrix)

    # Concatenate values
    xv = np.concatenate([x1v1x2v2_x3, np.expand_dims(x3, axis=1), np.expand_dims(v3, axis=1)], axis=1)

    # Compute e, a, and y
    x = xv[:, [0,2,4]]
    v = xv[:, [1,3,5]]
    e = np.power(1 + np.exp(- BETA[0] - x.dot(BETA[1:])), -1)
    a = np.random.binomial(1, e, n)
    y = x.dot(NU[1:-1]) + v.dot(XI) + a*NU[-1] + np.random.binomial(1, e, n) + np.random.normal(0, 1, n)

    # Create the final df
    synthetic_data_df = pd.DataFrame(np.concatenate([x, np.expand_dims(a, axis=1), v, np.expand_dims(y, axis=1)], axis=1), columns=["X1", "X2", "X3", "T", "V1", "V2", "V3", "Y"])
    synthetic_data_df["X3"] = synthetic_data_df["X3"].astype(int)
    synthetic_data_df["V3"] = synthetic_data_df["V3"].astype(int)
    synthetic_data_df["T"] = synthetic_data_df["T"].astype(int)

    return synthetic_data_df

df = generate_lunceford()

df.head()
Out[30]:
X1 X2 X3 T V1 V2 V3 Y
0 -1.049773 1.647873 0 0 -1.848568 2.006223 0 6.131881
1 -1.651112 1.396177 0 0 -1.143154 0.800736 0 5.947488
2 -1.838529 1.080599 0 1 -0.937024 0.384680 1 7.247444
3 -1.092876 0.609056 0 0 -0.608222 -0.221501 0 3.418656
4 0.241816 -0.473207 0 0 0.990614 0.297986 0 -0.809088

Here, the exact ATE can be explicitly calculated using the previously defined assumptions.

\[\begin{split} \begin{align*} \mathbb{E}[Y(1) - Y(0)] &= \mathbb{E}_{X,V}[\mathbb{E}[Y \mid T=1, X, V] - \mathbb{E}[Y \mid T=0, X, V]]\\ &= \mathbb{E} [ (- X_1 + X_2 - X_3 + 2 \times 1 -V_1 + V_2 + V_3)] - \mathbb{E}[(- X_1 + X_2 - X_3 + 2 \times 0 -V_1 + V_2 + V_3 ) ] \\ &= 2 \end{align*}\end{split}\]

3.2 Setup

Given that we know the data-generating mechanism, we can directly specify the causal structure by constructing a Causal Bayesian Network. To utilize the CausalBNEstimator, we will first employ the discretizer module to determine the domains of the variables.

In [31]:
discretizer = disc.Discretizer(
    defaultDiscretizationMethod='uniform',
    defaultNumberOfBins=10
)
discretizer.setDiscretizationParameters("Y", 'uniform', 60)

bn = discretizer.discretizedTemplate(df)

bn.beginTopologyTransformation()
for _, name in bn:
    if name != "Y":
        bn.addArc(name, "Y")
for X in ["X1", "X2", "X3"]:
    bn.addArc(X, "T")
for XV in ["X1", "V1", "X2", "V2"]:
    bn.addArc("X3", XV)
bn.addArc("X3", "V3")
bn.endTopologyTransformation()

causal_model = csl.CausalModel(bn)

cslnb.showCausalModel(causal_model, size="10")
../_images/notebooks_66-Causality_CausalEffectEstimation_79_0.svg
In [32]:
cee = csl.CausalEffectEstimation(df, causal_model)

3.3 Causal Identification

In [33]:
cee.identifyAdjustmentSet(intervention="T", outcome="Y")
Backdoor adjustment found.

Supported estimators include:
- CausalModelEstimator
- SLearner
- TLearner
- XLearner
- PStratification
- IPW
Out[33]:
'Backdoor'

3.4 Causal Estimation

In [34]:
cee.fitCausalBNEstimator()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2*100)} %")
ACE linear = 1.4049944870061684, MAPE = 29.75027564969158 %
In [35]:
cee.fitSLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 2.003169942031878, MAPE = 0.15849710159390185 %
In [36]:
cee.fitTLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 1.9978358695427927, MAPE = 0.1082065228603657 %
In [37]:
cee.fitIPW()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 2.0917446591374964, MAPE = 4.5872329568748205 %

3.5 Structure Learning with imposed slice order

We will assess the performance of the BNLearner module to uncover the causal structure of the dataset. A slice order will be imposed on the variables to guide the learning process. Following the identification of the structure via the Structure Learning algorithm, we will proceed to estimate the causal effect based on the inferred structure.

In [38]:
template = discretizer.discretizedTemplate(df)

structure_learner = gum.BNLearner(df, template)
structure_learner.useNMLCorrection()
structure_learner.useSmoothingPrior(1e-9)
structure_learner.setSliceOrder([["X3","X1","X2","V1","V2","V3"], ["T"], ["Y"]])

learned_bn = structure_learner.learnBN()
learned_causal_model = csl.CausalModel(learned_bn)

print(structure_learner)
Filename               : C:\Users\phw\AppData\Local\Temp\tmpsjhxalx8.csv
Size                   : (100000,8)
Variables              : X1[10], X2[10], X3[2], T[2], V1[10], V2[10], V3[2], Y[60]
Induced types          : False
Missing values         : False
Algorithm              : MIIC
Correction             : NML
Prior                  : Smoothing
Prior weight           : 0.000000
Constraint Slice Order : {T:1, X2:0, V3:0, V1:0, Y:2, X3:0, X1:0, V2:0}

In [39]:
gnb.sideBySide(
    gexpl.getInformation(learned_bn, size="50"),
    gnb.getInference(learned_bn, size="50")
)
G X3 X3 V3 V3 X3->V3 Y Y X3->Y T T X3->T X1 X1 X3->X1 V1 V1 V1->Y V2 V2 V2->Y X2 X2 X2->X3 X2->Y X2->T T->Y X1->Y X1->T
structs Inference in  10.58ms X1 2024-09-22T19:11:17.203751 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ T 2024-09-22T19:11:17.471610 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X1->T Y 2024-09-22T19:11:17.879676 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X1->Y X2 2024-09-22T19:11:17.310029 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X3 2024-09-22T19:11:17.404226 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X2->X3 X2->T X2->Y X3->X1 X3->T V3 2024-09-22T19:11:17.734519 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X3->V3 X3->Y T->Y V1 2024-09-22T19:11:17.559397 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ V1->Y V2 2024-09-22T19:11:17.659684 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ V2->Y

Subsequently, we will reuse the previously established pipeline to estimate the causal effect based on the identified structure.

In [40]:
cee = csl.CausalEffectEstimation(df, learned_causal_model)
In [41]:
cee.identifyAdjustmentSet(intervention="T", outcome="Y")
Backdoor adjustment found.

Supported estimators include:
- CausalModelEstimator
- SLearner
- TLearner
- XLearner
- PStratification
- IPW
Out[41]:
'Backdoor'
In [42]:
cee.fitCausalBNEstimator()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2*100)} %")
ACE linear = 1.2682577474049224, MAPE = 36.58711262975388 %
In [43]:
cee.fitSLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 2.003169942031878, MAPE = 0.15849710159390185 %
In [44]:
cee.fitTLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 1.9978358695427927, MAPE = 0.1082065228603657 %
In [45]:
cee.fitIPW()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 2.0917446591374964, MAPE = 4.5872329568748205 %

We observe that the results remain largely consistent when imposing a slice order on the variables. However, the CausalBNEstimator appears to be the most sensitive to structural changes, as it relies on the entire graph structure for its estimations.

3.6 Structure Learning with imposed slice order

Next, we will evaluate the estimations produced when allowing the algorithm to perform structure learning autonomously, without imposing any slice order.

In [46]:
template = discretizer.discretizedTemplate(df)

structure_learner = gum.BNLearner(df, template)
structure_learner.useNMLCorrection()
structure_learner.useSmoothingPrior(1e-9)

learned_bn = structure_learner.learnBN()
learned_causal_model = csl.CausalModel(learned_bn)

print(structure_learner)
Filename       : C:\Users\phw\AppData\Local\Temp\tmp87r_07ne.csv
Size           : (100000,8)
Variables      : X1[10], X2[10], X3[2], T[2], V1[10], V2[10], V3[2], Y[60]
Induced types  : False
Missing values : False
Algorithm      : MIIC
Correction     : NML
Prior          : Smoothing
Prior weight   : 0.000000

In [47]:
gnb.sideBySide(
    gexpl.getInformation(learned_bn, size="50"),
    gnb.getInference(learned_bn, size="50")
)
G X3 X3 X2 X2 X3->X2 X1 X1 X3->X1 V1 V1 Y Y V1->Y V2 V2 V2->Y V3 V3 V3->X3 Y->X3 Y->X2 Y->X1 T T T->X3 T->Y T->X2 T->X1
structs Inference in   1.00ms X1 2024-09-22T19:11:28.486372 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X2 2024-09-22T19:11:28.573102 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X3 2024-09-22T19:11:28.640134 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ X3->X1 X3->X2 T 2024-09-22T19:11:28.699637 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ T->X1 T->X2 T->X3 Y 2024-09-22T19:11:29.040697 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ T->Y V1 2024-09-22T19:11:28.775810 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ V1->Y V2 2024-09-22T19:11:28.862808 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ V2->Y V3 2024-09-22T19:11:28.929753 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/ V3->X3 Y->X1 Y->X2 Y->X3

We are encountering challenges at the outset, as numerous causal relationships are misspecified, with multiple covariates incorrectly identified as causes of the outcome. Despite this, the intervention variable is correctly identified as a cause of the outcome. With limited expectations, let us now examine the estimation results.

In [48]:
cee = csl.CausalEffectEstimation(df, learned_causal_model)
In [49]:
cee.identifyAdjustmentSet(intervention="T", outcome="Y")
Randomized Controlled Trial adjustment found.

Supported estimators include:
- CausalModelEstimator
- DM
If the outcome variable is a cause of other covariates in the causal graph,
Backdoor estimators may also be used.
Out[49]:
'Randomized Controlled Trial'
In [50]:
cee.fitCausalBNEstimator()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2*100)} %")
ACE linear = 0.8778137094311345, MAPE = 56.10931452844328 %
In [51]:
cee.fitDM()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = -2.8266585405983644, MAPE = 241.3329270299182 %

The results are exceptionally poor. In an attempt to address this issue, we will investigate whether controlling for covariates that are identified as causes of the outcome in the causal graph improves the estimation.

In [52]:
cee.useBackdoorAdjustment(intervention="T", outcome="Y", confounders={"V1", "V2"})
In [53]:
cee.fitSLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 1.0175205331546058, MAPE = 49.123973342269714 %
In [54]:
cee.fitTLearner()
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
ACE linear = 1.0490841845358503, MAPE = 47.545790773207486 %

One might assume that the estimation methods are not sufficiently sophisticated, given that the default meta-learners utilize Linear Regression as the base model. To address this, we will assess whether employing XGBRegressor as the base learner within the XLearner framework improves the estimation results.

In [55]:
cee.fitXLearner(learner="XGBRegressor")
tau_hat = cee.estimateCausalEffect()

print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File ~\scoop\apps\python\current\Lib\site-packages\pyAgrum\causal\causalEffectEstimation\_learners.py:85, in learnerFromString(learner_string)
     84 case "XGBRegressor":
---> 85     return xgboost.XGBRegressor()
     86 case "XGBClassifier":

NameError: name 'xgboost' is not defined

During handling of the above exception, another exception occurred:

NameError                                 Traceback (most recent call last)
Cell In[55], line 1
----> 1 cee.fitXLearner(learner="XGBRegressor")
      2 tau_hat = cee.estimateCausalEffect()
      4 print(f"ACE linear = {tau_hat}, MAPE = {abs((tau_hat-2)/2)*100} %")

File ~\scoop\apps\python\current\Lib\site-packages\pyAgrum\causal\causalEffectEstimation\_CausalEffectEstimation.py:683, in CausalEffectEstimation.fitXLearner(self, **estimator_params)
    680 if self._X is None or len(self._X) == 0:
    681     raise RCTError("XLearner")
--> 683 self._estimator = XLearner(**estimator_params)
    684 self._fitEstimator()

File ~\scoop\apps\python\current\Lib\site-packages\pyAgrum\causal\causalEffectEstimation\_backdoorEstimators.py:344, in XLearner.__init__(self, learner, control_outcome_learner, treatment_outcome_learner, control_effect_learner, treatment_effect_learner, propensity_score_learner)
    341     self.treatment_effect_learner = learnerFromString(
    342         "LinearRegression")
    343 elif isinstance(learner, str):
--> 344     self.control_outcome_learner = learnerFromString(learner)
    345     self.treatment_outcome_learner = learnerFromString(learner)
    346     self.control_effect_learner = learnerFromString(learner)

File ~\scoop\apps\python\current\Lib\site-packages\pyAgrum\causal\causalEffectEstimation\_learners.py:95, in learnerFromString(learner_string)
     93     raise sklearn_import_error
     94 if error.name == "xgboost" and not found_xgboost:
---> 95     raise xgboost_import_error

NameError: name 'xgboost_import_error' is not defined

The results show only marginal improvement, insufficient to draw reliable conclusions about the true causal effect.

This underscores the critical role of domain knowledge in accurately specifying the underlying causal structure. Even the most sophisticated estimation methods can falter when the causal structure is incorrectly specified.

3.7 Remarks

Overall, we observe that Bayesian Network-based estimators consistently underestimate the ACE. This underestimation arises due to the discretization and learning processes, which tend to homogenize the data by averaging it out. Since the ACE is computed as the difference between two expectations, this induced homogeneity reduces the variance between the groups, leading to a smaller difference and consequently an underestimation of the ACE.