Comparing BNs

Creative Commons License

aGrUM

interactive online version

In [1]:
def dict2html(di1,di2=None):
    res= "<br/>".join([f"<b>{k:15}</b>:{v}" for k,v in di1.items()])
    if di2 is not None:
        res+="<br/><br/>"
        res+= "<br/>".join([f"<b>{k:15}</b>:{v}" for k,v in di2.items()])
    return res


In [2]:
import pyagrum as gum
import pyagrum.lib.notebook as gnb
import pyagrum.lib.bn_vs_bn as gcm

How to compare two BNs

PyAgrum allows you to compare BNs in several ways. This notebook show you some of them:

  • a graphical diff between the 2 BNs

  • some scores form recal and precision

  • distance measures (for more, see notebook 26-klForBNs for more)

Between two different structures

The graphical diff propose to show the different possible structural differences from two structures with the same nodes in the layout of the first BN (the reference of the comparison).

In [3]:
bn1=gum.fastBN("A->B->C->D->E<-A->F")
bn2=gum.fastBN("A->B<-C->D->E<-A;F->E")
gnb.showBNDiff(bn1,bn2)
../_images/notebooks_97-Tools_ComparingBN_6_0.svg

The meaning of the diffent style for the arcs are :

In [4]:
import pyagrum.lib.bn_vs_bn as bnvsbn
bnvsbn.graphDiffLegend()
Out[4]:
G a->b overflow c->d Missing e->f reversed g->h Correct
In [5]:
cmp=gcm.GraphicalBNComparator(bn1,bn2)
kl=gum.ExactBNdistance(bn1,bn2) # bruteForce is possible car the BNs are small
gnb.sideBySide(bn1,bn2,gnb.getBNDiff(bn1,bn2),dict2html(cmp.scores(),cmp.hamming()),cmp.equivalentBNs(),dict2html(kl.compute()),
              captions=['bn1','bn2','graphical diff','Scores','equivalent ?','distances'],valign="bottom")
G F F C C D D C->D E E B B B->C D->E A A A->F A->E A->B
bn1
G F F E E F->E C C B B C->B D D C->D D->E A A A->E A->B
bn2
G A A B B A->B E E A->E F F A->F C C C->B D D C->D D->E F->E
graphical diff
count :{'tp': 4, 'tn': 22, 'fp': 2, 'fn': 2}
recall :0.6666666666666666
precision :0.6666666666666666
fscore :0.6666666666666666
dist2opt :0.47140452079103173

hamming :2
structural hamming:4
Scores
B has different parents in the two bns whose names are in {'C'}
equivalent ?
klPQ :1.3186720180195564
errorPQ :0
klQP :1.374197600906502
errorQP :0
hellinger :0.6301467201071447
bhattacharya :0.22132326458042023
jensen-shannon :0.25729379686869835
distances

The logic for the arcs of the graphical diff is the following. When comparaing bn1 with bn2 (in that order) :

  • full black line: the arc is common for both

  • full red line: the arc is common but inverted in bn2

  • dotted black line: the arc is added in bn2

  • dotted red line: the arc is removed in bn2

For the scores :

  • precision and recall are computed considering BN1 as the reference

  • \(Fscore=\frac{2\cdot recall\cdot precision}{recall+precision}\) is the weighted average of Precision and Recall.

  • \(dist2opt=\sqrt{(1-precision)^2+(1-recall)^2}\) represents the euclidian distance to the ideal(precision=1,recall=1)

EquivalentBN return “OK” if equivalent or a reason for non equivalence

Finally, BruteForceKL compute in the same time several distances : I-projection, M-projection, Hellinger and Bhattacharya. For more complex BNs, there exists a GibbsKL to approximate those distances. Of course, the computation are much slower.

Same structure, different parameters

In [6]:
bn1=gum.fastBN("A->B->C->D->E<-A->F")
bn2=gum.fastBN("A->B->C->D->E<-A->F")
cmp=gcm.GraphicalBNComparator(bn1,bn2)
kl=gum.ExactBNdistance(bn1,bn2) # bruteForce is possible car the BNs are small
gnb.sideBySide(bn1,bn2,gnb.getBNDiff(bn1,bn2),dict2html(cmp.scores(),cmp.hamming()),cmp.equivalentBNs(),dict2html(kl.compute()),
              captions=['bn1','bn2','graphical diff','Scores','equivalent ?','distances'],valign="bottom")
G F F C C D D C->D E E B B B->C D->E A A A->F A->E A->B
bn1
G F F C C D D C->D E E B B B->C D->E A A A->F A->E A->B
bn2
G A A B B A->B E E A->E F F A->F C C B->C D D C->D D->E
graphical diff
count :{'tp': 6, 'tn': 24, 'fp': 0, 'fn': 0}
recall :1.0
precision :1.0
fscore :1.0
dist2opt :0.0

hamming :0
structural hamming:0
Scores
Different CPTs for A
equivalent ?
klPQ :2.9588616639054304
errorPQ :0
klQP :3.938750022449966
errorQP :0
hellinger :0.956102134091789
bhattacharya :0.6107668602928096
jensen-shannon :0.5555148742661353
distances

identical BNs

In [7]:
bn1=gum.fastBN("A->B->C->D->E<-A->F")
bn2=bn1
cmp=gcm.GraphicalBNComparator(bn1,bn2)
kl=gum.ExactBNdistance(bn1,bn2) # bruteForce is possible car the BNs are small
gnb.sideBySide(bn1,bn2,gnb.getBNDiff(bn1,bn2),dict2html(cmp.scores(),cmp.hamming()),cmp.equivalentBNs(),dict2html(kl.compute()),
              captions=['bn1','bn2','graphical diff','Scores','equivalent ?','distances'],valign="bottom")
G F F C C D D C->D E E B B B->C D->E A A A->F A->E A->B
bn1
G F F C C D D C->D E E B B B->C D->E A A A->F A->E A->B
bn2
G A A B B A->B E E A->E F F A->F C C B->C D D C->D D->E
graphical diff
count :{'tp': 6, 'tn': 24, 'fp': 0, 'fn': 0}
recall :1.0
precision :1.0
fscore :1.0
dist2opt :0.0

hamming :0
structural hamming:0
Scores
OK
equivalent ?
klPQ :0.0
errorPQ :0
klQP :0.0
errorQP :0
hellinger :0.0
bhattacharya :-0.0
jensen-shannon :-1.4028051651801328e-17
distances

In the notebook Learning_DirichletPriorAndWeightedDatabase, you can find an interresting discussion on how can change those scores and distance.