# Explaining a model¶

```
In [1]:
```

```
import time
from pyAgrum.lib.bn2graph import BN2dot
import numpy as np
import pandas as pd
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.explain as expl
import matplotlib.pyplot as plt
```

## Building the model¶

We build a simple graph for the example

```
In [2]:
```

```
template=gum.fastBN("X1->X2->Y;X3->Z->Y;X0->Z;X1->Z;X2->R[5];Z->R;X1->Y")
data_path = "res/shap/Data_6var_direct_indirect.csv"
#gum.generateSample(template,1000,data_path)
learner = gum.BNLearner(data_path,template)
bn = learner.learnParameters(template.dag())
bn
```

```
Out[2]:
```

## 1-independence list (w.r.t. the class Y)¶

Given a model, it may be interesting to investigate the conditional independences of the class Y created by this very model.

```
In [3]:
```

```
# this function explores all the CI between 2 variables and computes the p-values w.r.t to a csv file.
expl.independenceListForPairs(bn,data_path)
```

```
Out[3]:
```

```
{('R', 'X0', ('X1', 'Z')): 0.7083382647903902,
('R', 'X1', ('X2', 'Z')): 0.46938486254099493,
('R', 'X3', ('X1', 'Z')): 0.4128522974536623,
('R', 'Y', ('X2', 'Z')): 0.8684231094674686,
('X0', 'X1', ()): 0.723302358657366,
('X0', 'X2', ()): 0.9801394906304377,
('X0', 'X3', ()): 0.7676868597218647,
('X0', 'Y', ('X1', 'Z')): 0.5816487109659612,
('X1', 'X3', ()): 0.5216508257424717,
('X2', 'X3', ()): 0.9837021981131505,
('X2', 'Z', ('X1',)): 0.6638491605436834,
('X3', 'Y', ('X1', 'Z')): 0.8774081450472304}
```

… with respect to a specific target.

```
In [4]:
```

```
expl.independenceListForPairs(bn,data_path,target="Y")
```

```
Out[4]:
```

```
{('Y', 'R', ('X2', 'Z')): 0.8684231094674686,
('Y', 'X0', ('X1', 'Z')): 0.5816487109659612,
('Y', 'X3', ('X1', 'Z')): 0.8774081450472304}
```

## 2-ShapValues¶

```
In [5]:
```

```
print(expl.ShapValues.__doc__)
```

```
The ShapValue class implements the calculation of Shap values in Bayesian networks.
The main implementation is based on Conditional Shap values [3]_, but the Interventional calculation method proposed in [2]_ is also present. In addition, a new causal method, based on [1]_, is implemented which is well suited for Bayesian networks.
.. [1] Heskes, T., Sijben, E., Bucur, I., & Claassen, T. (2020). Causal Shapley Values: Exploiting Causal Knowledge. 34th Conference on Neural Information Processing Systems. Vancouver, Canada.
.. [2] Janzing, D., Minorics, L., & Blöbaum, P. (2019). Feature relevance quantification in explainable AI: A causality problem. arXiv: Machine Learning. Retrieved 6 24, 2021, from https://arxiv.org/abs/1910.13413
.. [3] Lundberg, S. M., & Su-In, L. (2017). A Unified Approach to Interpreting Model. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA.
```

The ShapValue class implements the calculation of Shap values in Bayesian networks. It is necessary to specify a target and to provide a Bayesian network whose parameters are known and will be used later in the different calculation methods.

```
In [6]:
```

```
gumshap = expl.ShapValues(bn, 'Y')
```

### Compute Conditionnal in Bayesian Network¶

A dataset (as a `pandas.dataframe`

) must be provided so that the Bayesian network can learn its parameters and then predict.

The method `conditional`

computes the conditonal shap values using the Bayesian Networks. It returns 2 graphs and a dictionary. The first one shows the distribution of the shap values for each of the variables, the second one classifies the variables by their importance.

```
In [7]:
```

```
train = pd.read_csv(data_path).sample(frac=1.)
```

```
In [8]:
```

```
t_start = time.time()
resultat = gumshap.conditional(train, plot=True,plot_importance=True,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
```

```
Run Time : 6.830995082855225 sec
```

```
In [9]:
```

```
t_start = time.time()
resultat = gumshap.conditional(train, plot=False,plot_importance=True,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
```

```
Run Time : 6.594994783401489 sec
```

```
In [10]:
```

```
resultat = gumshap.conditional(train, plot=True,plot_importance=False,percentage=False)
```

The result is returned as a dictionary, the keys are the names of the features and the associated value is the absolute value of the average of the calculated shap.

```
In [11]:
```

```
t_start = time.time()
resultat = gumshap.conditional(train, plot=False,plot_importance=False,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
resultat
```

```
Run Time : 6.998995780944824 sec
```

```
Out[11]:
```

```
{'Z': 0.5464180054433385,
'X2': 0.3271606443752007,
'X1': 0.2533375405370652,
'X0': 0.061767122000001715,
'X3': 0.10465402104047901,
'R': 0.054456334441524014}
```

### Causal Shap Values¶

This method is similar to the previous one, except the formula of computation. It computes the causal shap value as described in the paper of Heskes *Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models* .

```
In [12]:
```

```
t_start = time.time()
causal = gumshap.causal(train, plot=True, plot_importance=True, percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
```

```
Run Time : 7.8449952602386475 sec
```

As you can see, since \(R\) is not among the ‘causes’ of Y, its causal importance is null.

### Marginal Shap Values¶

Similarly, one can also compute marginal Shap Value.

```
In [13]:
```

```
t_start = time.time()
marginal = gumshap.marginal(train, sample_size=10, plot=True,plot_importance=True,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
print(marginal)
```

```
Run Time : 42.12499499320984 sec
{'Z': 0.7937690075288257, 'X2': 0.36899838317867956, 'X1': 0.3507606787561853, 'X0': 0.0, 'X3': 0.0, 'R': 0.0}
```

As you can see, since \(R\), \(X0\) and \(X3\) are no in the Markov Blanket of \(Y\), their marginal importances are null.

### Visualizing shapvalues directly on a BN¶

This method returns a coloured graph that makes it easier to understand which variable is important and where it is located in the graph.

```
In [14]:
```

```
import pyAgrum.lib.notebook as gnb
g = gumshap.showShapValues(causal)
gnb.showGraph(g)
```

## Visualizing information¶

```
In [15]:
```

```
expl.showInformation(bn)
```

```
In [ ]:
```

```
```