Using pyAgrum
In [1]:
%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt
import os
Initialisation
importing pyAgrum
importing pyAgrum.lib tools
loading a BN
In [2]:
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gnb.configuration()
Library | Version |
---|---|
OS | posix [darwin] |
Python | 3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)] |
IPython | 8.26.0 |
Matplotlib | 3.9.1 |
Numpy | 2.0.1 |
pyDot | 3.0.1 |
pyAgrum | 1.15.0.9 |
In [3]:
bn=gum.loadBN("res/alarm.dsl")
gnb.showBN(bn,size='9')
Visualisation and inspection
In [4]:
print(bn['SHUNT'])
SHUNT:Labelized({NORMAL|HIGH})
In [5]:
print(bn.cpt(bn.idFromName('SHUNT')))
|| SHUNT |
PULMEM|INTUBA||NORMAL |HIGH |
------|------||---------|---------|
TRUE |NORMAL|| 0.1000 | 0.9000 |
FALSE |NORMAL|| 0.9500 | 0.0500 |
TRUE |ESOPHA|| 0.1000 | 0.9000 |
FALSE |ESOPHA|| 0.9500 | 0.0500 |
TRUE |ONESID|| 0.0100 | 0.9900 |
FALSE |ONESID|| 0.0500 | 0.9500 |
In [6]:
gnb.showPotential(bn.cpt(bn.idFromName('SHUNT')),digits=3)
|
| ||
---|---|---|---|
| 0.100 | 0.900 | |
0.950 | 0.050 | ||
| 0.100 | 0.900 | |
0.950 | 0.050 | ||
| 0.010 | 0.990 | |
0.050 | 0.950 |
Results of inference
It is easy to look at result of inference
In [7]:
gnb.showPosterior(bn,{'SHUNT':'HIGH'},'PRESS')
In [8]:
gnb.showPosterior(bn,{'MINVOLSET':'NORMAL'},'VENTALV')
Overall results
In [9]:
gnb.showInference(bn,size="10")
What is the impact of observed variables (SHUNT and VENTALV for instance) on another on (PRESS) ?
In [10]:
ie=gum.LazyPropagation(bn)
ie.evidenceImpact('PRESS',['SHUNT','VENTALV'])
Out[10]:
|
|
|
| ||
---|---|---|---|---|---|
| 0.0569 | 0.2669 | 0.2005 | 0.4757 | |
0.0208 | 0.2515 | 0.0553 | 0.6724 | ||
0.0769 | 0.3267 | 0.1772 | 0.4192 | ||
0.0501 | 0.1633 | 0.2796 | 0.5071 | ||
| 0.0589 | 0.2726 | 0.1997 | 0.4688 | |
0.0318 | 0.2237 | 0.0521 | 0.6924 | ||
0.1735 | 0.5839 | 0.1402 | 0.1024 | ||
0.0711 | 0.2347 | 0.2533 | 0.4410 |
Using inference as a function
It is also easy to use inference as a routine in more complex procedures.
In [11]:
import time
r=range(0,100)
xs=[x/100.0 for x in r]
tf=time.time()
ys=[gum.getPosterior(bn,evs={'MINVOLSET':[0,x/100.0,0.5]},target='VENTALV').tolist()
for x in r]
delta=time.time()-tf
p=plot(xs,ys)
legend(p,[bn['VENTALV'].label(i)
for i in range(bn['VENTALV'].domainSize())],loc=7);
title('VENTALV (100 inferences in %d ms)'%delta);
ylabel('posterior Probability');
xlabel('Evidence on MINVOLSET : [0,x,0.5]')
plt.show()
Another example : python gives access to a large set of tools. Here the value for the equality of two probabilities of a posterior is easely computed.
In [12]:
x=[p/100.0 for p in range(0,100)]
tf=time.time()
y=[gum.getPosterior(bn,evs={'HRBP':[1.0-p/100.0,1.0-p/100.0,p/100.0]},target='TPR').tolist()
for p in range(0,100)]
delta=time.time()-tf
p=plot(x,y)
title('HRBP (100 inferences in %d ms)'%delta);
v=bn['TPR']
legend([v.label(i) for i in range(v.domainSize())],loc='best');
np1=(transpose(y)[0]>transpose(y)[2]).argmin()
text(x[np1]-0.05,y[np1][0]+0.005,str(x[np1]),bbox=dict(facecolor='red', alpha=0.1))
plt.show()
BN as a classifier
Generation of databases
Using the CSV format for the database:
In [13]:
print(f"The log2-likelihood of the generated base : {gum.generateSample(bn,1000,'out/test.csv',with_labels=True):.2f}")
The log2-likelihood of the generated base : -14997.15
In [14]:
with open("out/test.csv","r") as src:
for _ in range(10):
print(src.readline(),end="")
HR,DISCONNECT,HISTORY,MINVOL,VENTTUBE,ANAPHYLAXIS,CVP,EXPCO2,HREKG,SHUNT,STROKEVOLUME,PCWP,PULMEMBOLUS,HYPOVOLEMIA,PRESS,VENTALV,VENTLUNG,VENTMACH,CO,ERRLOWOUTPUT,PAP,HRBP,ARTCO2,PVSAT,INSUFFANESTH,HRSAT,INTUBATION,KINKEDTUBE,BP,CATECHOL,SAO2,FIO2,MINVOLSET,ERRCAUTER,LVEDVOLUME,LVFAILURE,TPR
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,NORMAL,LOW,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,LOW,ZERO,ZERO,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,NORMAL,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,NORMAL
HIGH,FALSE,FALSE,LOW,LOW,FALSE,NORMAL,HIGH,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,LOW,HIGH,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,NORMAL,FALSE,HIGH,NORMAL,TRUE,NORMAL,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,NORMAL
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,NORMAL,LOW,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,ZERO,ZERO,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,NORMAL,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,LOW
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,NORMAL,LOW,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,ZERO,ZERO,NORMAL,HIGH,TRUE,NORMAL,NORMAL,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,HIGH,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,HIGH
HIGH,FALSE,FALSE,LOW,HIGH,FALSE,NORMAL,NORMAL,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,LOW,LOW,HIGH,HIGH,FALSE,NORMAL,HIGH,HIGH,NORMAL,FALSE,HIGH,NORMAL,FALSE,LOW,HIGH,LOW,NORMAL,HIGH,FALSE,NORMAL,FALSE,NORMAL
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,NORMAL,LOW,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,ZERO,ZERO,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,HIGH,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,HIGH
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,HIGH,LOW,HIGH,NORMAL,LOW,HIGH,FALSE,TRUE,HIGH,ZERO,ZERO,NORMAL,LOW,FALSE,NORMAL,HIGH,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,LOW,HIGH,LOW,NORMAL,NORMAL,FALSE,HIGH,FALSE,HIGH
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,LOW,LOW,HIGH,NORMAL,NORMAL,LOW,FALSE,FALSE,HIGH,ZERO,ZERO,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,LOW,FALSE,HIGH,NORMAL,FALSE,HIGH,HIGH,LOW,NORMAL,NORMAL,FALSE,LOW,FALSE,LOW
HIGH,FALSE,FALSE,ZERO,LOW,FALSE,NORMAL,HIGH,HIGH,NORMAL,NORMAL,NORMAL,FALSE,FALSE,HIGH,ZERO,ZERO,NORMAL,HIGH,FALSE,NORMAL,HIGH,HIGH,NORMAL,FALSE,HIGH,NORMAL,FALSE,HIGH,HIGH,LOW,NORMAL,NORMAL,FALSE,NORMAL,FALSE,HIGH
probabilistic classifier using BN
(because of the use of from-bn-generated csv files, quite good ROC curves are expected)
In [15]:
from pyAgrum.lib.bn2roc import showROC_PR
showROC_PR(bn,"out/test.csv",
target='CATECHOL',label='HIGH', # class and label
show_progress=True,show_fig=True,with_labels=True)
out/test.csv: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████|
Out[15]:
(0.9753106289834546,
np.float64(0.8384562795999999),
0.9987184288821391,
np.float64(0.56442831965))
Using another class variable
In [16]:
showROC_PR(bn,"out/test.csv",'SAO2','HIGH',show_progress=True)
out/test.csv: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████|
Out[16]:
(0.9483556584041654,
np.float64(0.0040643702),
0.6539321818166576,
np.float64(0.4871612786))
Fast prototyping for BNs
In [17]:
bn1=gum.fastBN("a->b;a->c;b->c;c->d",3)
gnb.sideBySide(*[gnb.getInference(bn1,evs={'c':val},targets={'a','c','d'}) for val in range(3)],
captions=[f"Inference given that $c={val}$" for val in range(3)])
In [18]:
print(gum.getPosterior(bn1,evs={'c':0},target='c'))
print(gum.getPosterior(bn1,evs={'c':0},target='d'))
# using pyagrum.lib.notebook's helpers
gnb.flow.row(gum.getPosterior(bn1,evs={'c':0},target='c'),gum.getPosterior(bn1,evs={'c':0},target='d'))
c |
0 |1 |2 |
---------|---------|---------|
1.0000 | 0.0000 | 0.0000 |
d |
0 |1 |2 |
---------|---------|---------|
0.3149 | 0.2314 | 0.4536 |
|
|
|
---|---|---|
1.0000 | 0.0000 | 0.0000 |
|
|
|
---|---|---|
0.3149 | 0.2314 | 0.4536 |
Joint posterior, impact of multiple evidence
In [19]:
bn=gum.fastBN("a->b->c->d;b->e->d->f;g->c")
gnb.sideBySide(bn,gnb.getInference(bn))
In [20]:
ie=gum.LazyPropagation(bn)
ie.addJointTarget({"e","f","g"})
ie.makeInference()
gnb.sideBySide(ie.jointPosterior({"e","f","g"}),ie.jointPosterior({"e","g"}),
captions=["Joint posterior $P(e,f,g)$","Joint posterior $P(e,f)$"])
In [21]:
gnb.sideBySide(ie.evidenceImpact("a",["e","f"]),ie.evidenceImpact("a",["d","e","f"]),
captions=["$\\forall e,f, P(a|e,f)$",
"$\\forall d,e,f, P(a|d,e,f)=P(a|d,e)$ using d-separation"]
)
In [22]:
gnb.sideBySide(ie.evidenceJointImpact(["a","b"],["e","f"]),ie.evidenceJointImpact(["a","b"],["d","e","f"]),
captions=["$\\forall e,f, P(a,b|e,f)$",
"$\\forall d,e,f, P(a,b|d,e,f)=P(a,b|d,e)$ using d-separation"]
)
Most Probable Explanation
The Most Probable Explanation (MPE) is a concept commonly used in the field of probabilistic reasoning and Bayesian statistics. It refers to the set of values for the variables in a given probabilistic model that is the most consistent with (that maximizes the likelihood of) the observed evidence. Essentially, it represents the most likely scenario or explanation given the available evidenceand the underlying probabilistic model.
In [23]:
ie=gum.LazyPropagation(bn)
print(ie.mpe())
<d:1|e:1|c:0|b:1|a:1|g:0|f:1>
In [24]:
evs={"e":0,"g":0}
ie.setEvidence(evs)
vals=ie.mpeLog2Posterior()
print(f"The most probable explanation for observation {evs} is the configuration {vals.first} for a log probability of {vals.second:.6f}")
The most probable explanation for observation {'e': 0, 'g': 0} is the configuration <g:0|e:0|d:0|f:0|c:0|b:0|a:0> for a log probability of -3.025731
In [ ]: