Smoking (chapter 5)

Creative Commons License

aGrUM

interactive online version

Authors: Aymen Merrouche and Pierre-Henri Wuillemin.

This notebook follows the example from “The Book Of Why” (Pearl, 2018) chapter 5

In [1]:
from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os

In the 1950s the strong association between smoking and lung cancer provoked a debate on the issue. Does smoking cause lung cancer?

Corresponding causal diagram:

The corresponding causal diagram is the following:

In [2]:
sc = gum.fastBN("Smoking->Lung Cancer")
sc
Out[2]:
G Smoking Smoking Lung Cancer Lung Cancer Smoking->Lung Cancer

Constitutional Hypothesis:

Smoking industry and some other skeptic statisticians advanced the theory that smokers are genetically different from nonsmokers. A smoking gene could be a confounder that would explain the observed association.

In [3]:
msc = csl.CausalModel(sc, [("Smoking Gene", ["Smoking","Lung Cancer"])])
cslnb.showCausalImpact(msc, "Lung Cancer", doing="Smoking",values={})
G Smoking Smoking Gene Lung Lung Cancer Cancer Smoking Gene Smoking Gene Smoking Gene->Smoking Lung Cancer Lung Cancer Smoking Gene->Lung Cancer
Causal Model
$$\begin{equation*}P( Lung Cancer \mid \text{do}(Smoking)) = P\left(Lung Cancer\right)\end{equation*}$$
Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).
Lung Cancer
0
1
0.28020.7198

Impact

This constitutional hypothesis was untestable, we couldn’t sequence the human genome at the time. However, this hypothesis wasn’t plausible because the observed association was way too strong.

Another explanation

To explain this association, another hypothesis was that a smoking gene could be a confounder but there was still a direct causal effect between smoking on lung cancer:

In [4]:
msc = csl.CausalModel(sc, [("Smoking Gene", ["Smoking","Lung Cancer"])], True)
cslnb.showCausalImpact(msc, "Lung Cancer", doing="Smoking",values={})
G Smoking Smoking Lung Cancer Lung Cancer Smoking->Lung Cancer Gene Lung Lung Cancer Cancer Smoking Gene Smoking Gene Smoking Gene->Smoking Smoking Gene->Lung Cancer
Causal Model
Hedge Error: G={'Smoking', 'Lung Cancer'}, G[S]={'Lung Cancer'}
Impossible
No result
Impact

Front door criterion:

Let’s suppose now that smoking causes cancer only through tar deposits that are fully due to the physical action of cigarettes, the causal diagram becomes:

In [5]:
sct = gum.fastBN("Smoking->Tar->Lung Cancer")
sct
Out[5]:
G Smoking Smoking Tar Tar Smoking->Tar Lung Cancer Lung Cancer Tar->Lung Cancer
In [6]:
msct = csl.CausalModel(sct, [("Smoking Gene", ["Smoking","Lung Cancer"])], True)
gnb.show(msct)
../_images/notebooks_BoW-c5pxxx-smoking_13_0.svg
In [7]:
cslnb.showCausalImpact(msct, "Lung Cancer", doing="Smoking",values={})
G Smoking Smoking Tar Tar Smoking->Tar Gene Lung Cancer Lung Cancer Tar->Lung Cancer Lung Lung Cancer Cancer Smoking Gene Smoking Gene Smoking Gene->Smoking Smoking Gene->Lung Cancer
Causal Model
$$\begin{equation*}P( Lung Cancer \mid \text{do}(Smoking)) = \sum_{Tar}{P\left(Tar\mid Smoking\right) \cdot \left(\sum_{Smoking'}{P\left(Lung Cancer\mid Smoking',Tar\right) \cdot P\left(Smoking'\right)}\right)}\end{equation*}$$
Explanation : frontdoor ['Tar'] found.
Lung Cancer
Smoking
0
1
0
0.48170.5183
1
0.48350.5165

Impact

Even if the smoking gene is unobservable, we can assess the causal effect of Smoking on Lung Cancer using the front-door method. In this case, the front-door is:

\[Smoking \rightarrow \color{red}{Tar} \rightarrow LungCancer\]

It consists of variables that we have observed:

  • We can measure the causal effect of \(Smoking\) on \(Tar\), there are no open back-doors between the two (\(Tar \leftarrow Smoking \rightarrow SmokingGene \leftarrow LungCancer\) is blocked by the collider node \(LungCancer\))

    \[P(Tar \mid do(Smoking)) = P (Tar \mid Smoking)\]
In [8]:
formula, adj, exp = csl.causalImpact(msct,on = "Tar",doing = "Smoking",values = {})
display(Math(formula.toLatex()))
$\displaystyle P( Tar \mid \text{do}(Smoking)) = P\left(Tar\mid Smoking\right)$
  • We can measure the causal effect of \(Tar\) on \(LungCancer\), we just need to adjust for the \(Smoking\) to block the “back-door path” $ Tar \leftarrow `Smoking :nbsphinx-math:leftarrow SmokingGene :nbsphinx-math:rightarrow `LungCancer$

    \[P(LungCancer \mid do(Tar)) = \sum_{Smoking}{P(LungCancer \mid Tar, Smoking) \times P(Smoking)}\]
In [9]:
formula, adj, exp = csl.causalImpact(msct,on = "Lung Cancer",doing = "Tar",values = {})
display(Math(formula.toLatex()))
$\displaystyle P( Lung Cancer \mid \text{do}(Tar)) = \sum_{Smoking}{P\left(Lung Cancer\mid Smoking,Tar\right) \cdot P\left(Smoking\right)}$

We can now combine these two pieces of information to have the causal effect of \(Smoking\) on \(LungCancer\) and reduce the expression of \(P(LungCancer \mid do(Smoking))\) to elements that we observed:

\[P(LungCancer \mid do(Smoking)) = \sum_{Tar}{(P(Tar \mid Smoking) \times \sum_{Smoking^{'}}{P(LungCancer \mid Tar, Smoking^{'}) \times P(Smoking^{'})})}\]

Birth-weight paradox:

Studies have shown that babies of smoking mothers tend to weigh less than average. Other studies have shown that low-birth-weight babies have a higher mortality rate than normal-birth-weight babies. The corresponding causal diagram is the following causal:

In [10]:
bwp = gum.fastBN("Smoking->Low Birth Weight->Mortality")
bwp
Out[10]:
G Smoking Smoking Low Birth Weight Low Birth Weight Smoking->Low Birth Weight Mortality Mortality Low Birth Weight->Mortality
In [11]:
# Causal effect of Smoking on neo-natal mortality
bwpModele = csl.CausalModel(bwp)
cslnb.showCausalImpact(bwpModele, "Mortality", doing="Smoking",values={})
G Smoking Smoking Low Birth Weight Low Birth Weight Smoking->Low Birth Weight Low Low Birth Birth Weight Weight Mortality Mortality Low Birth Weight->Mortality
Causal Model
$$\begin{equation*}P( Mortality \mid \text{do}(Smoking)) = \sum_{Low Birth Weight}{P\left(Low Birth Weight\mid Smoking\right) \cdot \left(\sum_{Smoking'}{P\left(Mortality\mid Low Birth Weight\right) \cdot P\left(Smoking'\right)}\right)}\end{equation*}$$
Explanation : frontdoor ['Low Birth Weight'] found.
Mortality
Smoking
0
1
0
0.50040.4996
1
0.31310.6869

Impact

However the data also showed that low-birth-weight babies of smoker mothers had lower mortality rates than low-birth-weight babies of non-smoker mothers. An explanation for this paradoxical situation is that low-birth-weight is either due to a smoking mother or to another birth defect that is much more threatening to the baby’s health. The causal diagram becomes:

In [12]:
bwpe = gum.fastBN("Smoking->Low Birth Weight->Mortality<-Smoking;Birth defect->Low Birth Weight;Mortality<-Birth defect")
bwpe
Out[12]:
G Smoking Smoking Low Birth Weight Low Birth Weight Smoking->Low Birth Weight Mortality Mortality Smoking->Mortality Low Birth Weight->Mortality Birth defect Birth defect Birth defect->Low Birth Weight Birth defect->Mortality

Pinpointing the source of this paradoxical situation becomes easy thanks to this causal diagram: “collider bias”.”Low Birth Weight” is a collider! The data only concerned low-birth-weight babies (it is as if we are adjusting for “Low Birth Weight.”). Knowing that the mother doesn’t smoke increases our belief that a birth defect is the cause of the low-birth-weight, and a birth defect is more threatening for the baby’s health. This opened the backdoor path formerly blocked and allowed non-causal information to flow from Smoking to Mortality ($Smoking \rightarrow `Low Birth Weight :nbsphinx-math:leftarrow Birth defect :nbsphinx-math:rightarrow `Mortality $) introducing a bias.

In [13]:
bwpeModele = csl.CausalModel(bwpe)
cslnb.showCausalImpact(bwpeModele, "Mortality", doing="Smoking",values={})
G Smoking Smoking Mortality Mortality Smoking->Mortality Low Birth Weight Low Birth Weight Smoking->Low Birth Weight Low Low Birth Birth Weight Weight defect defect Low Birth Weight->Mortality Birth defect Birth defect Birth defect->Mortality Birth defect->Low Birth Weight
Causal Model
$$\begin{equation*}P( Mortality \mid \text{do}(Smoking)) = \sum_{Birth defect,Low Birth Weight}{P\left(Birth defect\right) \cdot P\left(Mortality\mid Birth defect,Low Birth Weight,Smoking\right) \cdot P\left(Low Birth Weight\mid Birth defect,Smoking\right)}\end{equation*}$$
Explanation : Do-calculus computations
Mortality
Smoking
0
1
0
0.74560.2544
1
0.34480.6552

Impact
In [ ]: