Using pyAgrum

Creative Commons License

aGrUM

interactive online version

In [1]:
%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

import os

Initialisation

  • importing pyAgrum

  • importing pyAgrum.lib tools

  • loading a BN

In [2]:
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gnb.configuration()
LibraryVersion
OSposix [darwin]
Python3.12.2 (main, Feb 6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
IPython8.22.2
Matplotlib3.8.3
Numpy1.26.4
pyDot2.0.0
pyAgrum1.12.1.9
Wed Mar 20 15:17:46 2024 CET
In [3]:
bn=gum.loadBN("res/alarm.dsl")
gnb.showBN(bn,size='9')
../_images/notebooks_02-Tutorial_Tutorial2_5_0.svg

Visualisation and inspection

In [4]:
print(bn['SHUNT'])
SHUNT:Labelized({NORMAL|HIGH})
In [5]:
print(bn.cpt(bn.idFromName('SHUNT')))

             ||  SHUNT            |
PULMEM|INTUBA||NORMAL   |HIGH     |
------|------||---------|---------|
TRUE  |NORMAL|| 0.1000  | 0.9000  |
FALSE |NORMAL|| 0.9500  | 0.0500  |
TRUE  |ESOPHA|| 0.1000  | 0.9000  |
FALSE |ESOPHA|| 0.9500  | 0.0500  |
TRUE  |ONESID|| 0.0100  | 0.9900  |
FALSE |ONESID|| 0.0500  | 0.9500  |

In [6]:
gnb.showPotential(bn.cpt(bn.idFromName('SHUNT')),digits=3)
SHUNT
INTUBATION
PULMEMBOLUS
NORMAL
HIGH
NORMAL
TRUE
0.1000.900
FALSE
0.9500.050
ESOPHAGEAL
TRUE
0.1000.900
FALSE
0.9500.050
ONESIDED
TRUE
0.0100.990
FALSE
0.0500.950

Results of inference

It is easy to look at result of inference

In [7]:
gnb.showPosterior(bn,{'SHUNT':'HIGH'},'PRESS')
../_images/notebooks_02-Tutorial_Tutorial2_11_0.svg
In [8]:
gnb.showPosterior(bn,{'MINVOLSET':'NORMAL'},'VENTALV')
../_images/notebooks_02-Tutorial_Tutorial2_12_0.svg

Overall results

In [9]:
gnb.showInference(bn,size="10")
../_images/notebooks_02-Tutorial_Tutorial2_14_0.svg

What is the impact of observed variables (SHUNT and VENTALV for instance) on another on (PRESS) ?

In [10]:
ie=gum.LazyPropagation(bn)
ie.evidenceImpact('PRESS',['SHUNT','VENTALV'])
Out[10]:
PRESS
SHUNT
VENTALV
ZERO
LOW
NORMAL
HIGH
NORMAL
ZERO
0.05690.26690.20050.4757
LOW
0.02080.25150.05530.6724
NORMAL
0.07690.32670.17720.4192
HIGH
0.05010.16330.27960.5071
HIGH
ZERO
0.05890.27260.19970.4688
LOW
0.03180.22370.05210.6924
NORMAL
0.17350.58390.14020.1024
HIGH
0.07110.23470.25330.4410

Using inference as a function

It is also easy to use inference as a routine in more complex procedures.

In [11]:
import time
r=range(0,100)
xs=[x/100.0 for x in r]

tf=time.time()
ys=[gum.getPosterior(bn,evs={'MINVOLSET':[0,x/100.0,0.5]},target='VENTALV').tolist()
        for x in r]
delta=time.time()-tf

p=plot(xs,ys)
legend(p,[bn['VENTALV'].label(i)
          for i in range(bn['VENTALV'].domainSize())],loc=7);
title('VENTALV (100 inferences in %d ms)'%delta);
ylabel('posterior Probability');
xlabel('Evidence on MINVOLSET : [0,x,0.5]')
plt.show()
../_images/notebooks_02-Tutorial_Tutorial2_18_0.svg

Another example : python gives access to a large set of tools. Here the value for the equality of two probabilities of a posterior is easely computed.

In [12]:
x=[p/100.0 for p in range(0,100)]

tf=time.time()
y=[gum.getPosterior(bn,evs={'HRBP':[1.0-p/100.0,1.0-p/100.0,p/100.0]},target='TPR').tolist()
   for p in range(0,100)]
delta=time.time()-tf

p=plot(x,y)
title('HRBP (100 inferences in %d ms)'%delta);
v=bn['TPR']
legend([v.label(i) for i in range(v.domainSize())],loc='best');
np1=(transpose(y)[0]>transpose(y)[2]).argmin()
text(x[np1]-0.05,y[np1][0]+0.005,str(x[np1]),bbox=dict(facecolor='red', alpha=0.1))
plt.show()
../_images/notebooks_02-Tutorial_Tutorial2_20_0.svg

BN as a classifier

Generation of databases

Using the CSV format for the database:

In [13]:
print(f"The log2-likelihood of the generated base : {gum.generateSample(bn,1000,'out/test.csv',with_labels=True):.2f}")
The log2-likelihood of the generated base : -15310.52
In [14]:
with open("out/test.csv","r") as src:
    for _ in range(10):
        print(src.readline(),end="")
HR,CATECHOL,FIO2,HYPOVOLEMIA,VENTMACH,DISCONNECT,ARTCO2,VENTALV,PVSAT,EXPCO2,HRSAT,MINVOL,INTUBATION,STROKEVOLUME,LVFAILURE,VENTLUNG,TPR,PULMEMBOLUS,ANAPHYLAXIS,SAO2,VENTTUBE,PRESS,HISTORY,KINKEDTUBE,HRBP,MINVOLSET,PCWP,CO,PAP,LVEDVOLUME,BP,ERRLOWOUTPUT,CVP,SHUNT,INSUFFANESTH,HREKG,ERRCAUTER
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,LOW,FALSE,FALSE,LOW,LOW,HIGH,FALSE,FALSE,HIGH,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,LOW,FALSE,NORMAL,NORMAL,FALSE,HIGH,FALSE
NORMAL,NORMAL,NORMAL,FALSE,NORMAL,FALSE,HIGH,LOW,NORMAL,LOW,LOW,ZERO,NORMAL,NORMAL,FALSE,ZERO,NORMAL,FALSE,FALSE,HIGH,ZERO,ZERO,FALSE,FALSE,LOW,NORMAL,NORMAL,NORMAL,NORMAL,NORMAL,NORMAL,FALSE,NORMAL,HIGH,FALSE,NORMAL,FALSE
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,NORMAL,FALSE,FALSE,LOW,LOW,HIGH,FALSE,FALSE,NORMAL,NORMAL,NORMAL,HIGH,LOW,NORMAL,HIGH,TRUE,NORMAL,NORMAL,FALSE,HIGH,FALSE
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,NORMAL,NORMAL,LOW,FALSE,ZERO,HIGH,FALSE,FALSE,HIGH,LOW,HIGH,FALSE,FALSE,HIGH,NORMAL,NORMAL,LOW,NORMAL,NORMAL,LOW,FALSE,NORMAL,NORMAL,TRUE,HIGH,FALSE
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,HIGH,FALSE,FALSE,LOW,LOW,HIGH,FALSE,FALSE,HIGH,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,HIGH,FALSE,NORMAL,NORMAL,FALSE,HIGH,FALSE
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,LOW,FALSE,FALSE,LOW,LOW,HIGH,FALSE,FALSE,HIGH,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,NORMAL,FALSE,NORMAL,NORMAL,TRUE,HIGH,FALSE
HIGH,HIGH,NORMAL,FALSE,LOW,FALSE,HIGH,ZERO,LOW,NORMAL,NORMAL,ZERO,NORMAL,NORMAL,FALSE,ZERO,HIGH,FALSE,FALSE,LOW,ZERO,LOW,FALSE,FALSE,HIGH,LOW,NORMAL,HIGH,NORMAL,NORMAL,HIGH,FALSE,NORMAL,NORMAL,TRUE,LOW,TRUE
HIGH,HIGH,NORMAL,FALSE,NORMAL,TRUE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,HIGH,FALSE,FALSE,LOW,ZERO,NORMAL,FALSE,FALSE,HIGH,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,HIGH,FALSE,NORMAL,NORMAL,FALSE,HIGH,FALSE
HIGH,HIGH,NORMAL,FALSE,NORMAL,FALSE,HIGH,ZERO,LOW,LOW,HIGH,ZERO,NORMAL,NORMAL,FALSE,ZERO,LOW,FALSE,FALSE,LOW,LOW,NORMAL,FALSE,FALSE,HIGH,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,NORMAL,FALSE,NORMAL,NORMAL,FALSE,HIGH,FALSE

probabilistic classifier using BN

(because of the use of from-bn-generated csv files, quite good ROC curves are expected)

In [15]:
from pyAgrum.lib.bn2roc import showROC_PR

showROC_PR(bn,"out/test.csv",
        target='CATECHOL',label='HIGH',  # class and label
        show_progress=True,show_fig=True,with_labels=True)
out/test.csv: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
../_images/notebooks_02-Tutorial_Tutorial2_26_1.svg
Out[15]:
(0.959215863001352, 0.9643336302500001, 0.9978234596032383, 0.11514254295)

Using another class variable

In [16]:
showROC_PR(bn,"out/test.csv",'SAO2','HIGH',show_progress=True)
out/test.csv: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
../_images/notebooks_02-Tutorial_Tutorial2_28_1.svg
Out[16]:
(0.9525255102040817, 0.0052263681999999995, 0.657214331912374, 0.1112440184)

Fast prototyping for BNs

In [17]:
bn1=gum.fastBN("a->b;a->c;b->c;c->d",3)

gnb.sideBySide(*[gnb.getInference(bn1,evs={'c':val},targets={'a','c','d'}) for val in range(3)],
              captions=[f"Inference given that $c={val}$" for val in range(3)])
structs Inference in   0.49ms a 2024-03-20T15:17:50.834203 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b b a->b c 2024-03-20T15:17:50.855435 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ a->c b->c d 2024-03-20T15:17:50.876126 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ c->d
Inference given that $c=0$
structs Inference in   0.59ms a 2024-03-20T15:17:51.107526 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b b a->b c 2024-03-20T15:17:51.136578 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ a->c b->c d 2024-03-20T15:17:51.165169 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ c->d
Inference given that $c=1$
structs Inference in   0.61ms a 2024-03-20T15:17:51.398597 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b b a->b c 2024-03-20T15:17:51.434783 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ a->c b->c d 2024-03-20T15:17:51.472414 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ c->d
Inference given that $c=2$
In [18]:
print(gum.getPosterior(bn1,evs={'c':0},target='c'))
print(gum.getPosterior(bn1,evs={'c':0},target='d'))

# using pyagrum.lib.notebook's helpers
gnb.flow.row(gum.getPosterior(bn1,evs={'c':0},target='c'),gum.getPosterior(bn1,evs={'c':0},target='d'))

  c                          |
0        |1        |2        |
---------|---------|---------|
 1.0000  | 0.0000  | 0.0000  |


  d                          |
0        |1        |2        |
---------|---------|---------|
 0.0832  | 0.4212  | 0.4956  |

c
0
1
2
1.00000.00000.0000
d
0
1
2
0.08320.42120.4956

Joint posterior, impact of multiple evidence

In [19]:
bn=gum.fastBN("a->b->c->d;b->e->d->f;g->c")
gnb.sideBySide(bn,gnb.getInference(bn))
G f f a a b b a->b e e d d e->d d->f g g c c g->c c->d b->e b->c
structs Inference in   2.22ms a 2024-03-20T15:17:51.727174 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b 2024-03-20T15:17:51.771801 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ a->b c 2024-03-20T15:17:51.802502 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b->c e 2024-03-20T15:17:51.931523 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ b->e d 2024-03-20T15:17:51.908466 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ c->d f 2024-03-20T15:17:51.961455 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ d->f e->d g 2024-03-20T15:17:51.984781 image/svg+xml Matplotlib v3.8.3, https://matplotlib.org/ g->c
In [20]:
ie=gum.LazyPropagation(bn)
ie.addJointTarget({"e","f","g"})
ie.makeInference()
gnb.sideBySide(ie.jointPosterior({"e","f","g"}),ie.jointPosterior({"e","g"}),
               captions=["Joint posterior $P(e,f,g)$","Joint posterior $P(e,f)$"])
g
f
e
0
1
0
0
0.00740.2873
1
0.00520.2032
1
0
0.00810.3138
1
0.00440.1708

Joint posterior $P(e,f,g)$
g
e
0
1
0
0.01540.6010
1
0.00960.3739

Joint posterior $P(e,f)$
In [21]:
gnb.sideBySide(ie.evidenceImpact("a",["e","f"]),ie.evidenceImpact("a",["d","e","f"]),
              captions=["$\\forall e,f, P(a|e,f)$",
                        "$\\forall d,e,f, P(a|d,e,f)=P(a|d,e)$ using d-separation"]
                        )
a
f
e
0
1
0
0
0.70680.2932
1
0.65110.3489
1
0
0.70680.2932
1
0.65020.3498

$\forall e,f, P(a|e,f)$
a
e
d
0
1
0
0
0.70670.2933
1
0.70680.2932
1
0
0.65190.3481
1
0.64610.3539

$\forall d,e,f, P(a|d,e,f)=P(a|d,e)$ using d-separation
In [22]:
gnb.sideBySide(ie.evidenceJointImpact(["a","b"],["e","f"]),ie.evidenceJointImpact(["a","b"],["d","e","f"]),
              captions=["$\\forall e,f, P(a,b|e,f)$",
                        "$\\forall d,e,f, P(a,b|d,e,f)=P(a,b|d,e)$ using d-separation"]
                        )
b
f
e
a
0
1
0
0
0
0.14270.5641
1
0.23240.0608
1
0
0.18320.4680
1
0.29840.0504
1
0
0
0.14260.5641
1
0.23240.0608
1
0
0.18380.4664
1
0.29950.0503

$\forall e,f, P(a,b|e,f)$
b
e
d
a
0
1
0
0
0
0.14270.5640
1
0.23250.0608
1
0
0.14260.5642
1
0.23240.0608
1
0
0
0.18260.4693
1
0.29750.0506
1
0
0.18680.4594
1
0.30430.0495

$\forall d,e,f, P(a,b|d,e,f)=P(a,b|d,e)$ using d-separation

Most Probable Explanation

The Most Probable Explanation (MPE) is a concept commonly used in the field of probabilistic reasoning and Bayesian statistics. It refers to the set of values for the variables in a given probabilistic model that is the most consistent with (that maximizes the likelihood of) the observed evidence. Essentially, it represents the most likely scenario or explanation given the available evidenceand the underlying probabilistic model.

In [23]:
ie=gum.LazyPropagation(bn)
print(ie.mpe())
<d:0|e:0|c:0|b:1|a:0|g:1|f:0>
In [24]:
evs={"e":0,"g":0}
ie.setEvidence(evs)
vals=ie.mpeLog2Posterior()
print(f"The most probable explanation for observation {evs} is the configuration {vals.first} for a log probability of {vals.second:.6f}")
The most probable explanation for observation {'e': 0, 'g': 0} is the configuration <g:0|e:0|d:0|f:0|c:1|b:1|a:0> for a log probability of -2.774139
In [ ]: