# Learning essential graphs¶

In [1]:

%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

import os

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb


## Compare learning algorithms¶

Essentially MIIC and 3off2 computes the essential graph (CPDAG) from data. Essential graphs are mixed graphs.

In [2]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.use3off2()
learner.useNMLCorrection()
print(learner)
ge3off2=learner.learnEssentialGraph()

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : 3off2
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -


In [3]:

gnb.showDot(ge3off2.toDot());

In [4]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useMIIC()
learner.useNMLCorrection()
print(learner)
gemiic=learner.learnEssentialGraph()
gemiic

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : MIIC
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -


Out[4]:


For the others methods, it is possible to obtain the essential graph from the learned BN.

In [5]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useGreedyHillClimbing()
bnHC=learner.learnBN()
print(learner)
geHC=gum.EssentialGraph(bnHC)
geHC
gnb.sideBySide(bnHC,geHC)

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Greedy Hill Climbing
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -


 G smoking smoking bronchitis bronchitis bronchitis->smoking tuberculosis tuberculosis bronchitis->tuberculosis dyspnoea dyspnoea bronchitis->dyspnoea positive_XraY positive_XraY visit_to_Asia visit_to_Asia tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer lung_cancer->smoking tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->bronchitis tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->tuberculosis tuberculos_or_cancer->lung_cancer tuberculos_or_cancer->dyspnoea no_name 0 visit_to_Asia 2 tuberculosis 0->2 1 positive_XraY 6 tuberculos_or_cancer 1->6 3 lung_cancer 2->3 5 bronchitis 2->5 2->6 3->6 7 smoking 3->7 4 dyspnoea 4->5 4->6 5->6 5->7
In [6]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useLocalSearchWithTabuList()
print(learner)
bnTL=learner.learnBN()
geTL=gum.EssentialGraph(bnTL)
geTL
gnb.sideBySide(bnTL,geTL)

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Local Search with Tabu List
Tabu list size : 2
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -


 G smoking smoking bronchitis bronchitis smoking->bronchitis positive_XraY positive_XraY smoking->positive_XraY lung_cancer lung_cancer smoking->lung_cancer dyspnoea dyspnoea smoking->dyspnoea tuberculos_or_cancer tuberculos_or_cancer smoking->tuberculos_or_cancer bronchitis->dyspnoea tuberculosis tuberculosis tuberculosis->positive_XraY visit_to_Asia visit_to_Asia tuberculosis->visit_to_Asia tuberculosis->lung_cancer tuberculosis->tuberculos_or_cancer tuberculos_or_cancer->bronchitis tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer tuberculos_or_cancer->dyspnoea no_name 0 visit_to_Asia 2 tuberculosis 0->2 1 positive_XraY 6 tuberculos_or_cancer 1->6 2->1 3 lung_cancer 2->3 2->6 3->6 4 dyspnoea 5 bronchitis 4->5 6->4 6->5 7 smoking 7->1 7->3 7->4 7->5 7->6

Hence we can compare the 4 algorithms.

In [7]:

(
gnb.flow.clear()
.display()
)


Essential graph from 3off2

Essential graph from miic

BayesNet from GHC

Essential graph from GHC

BayesNet from TabuList

Essential graph from TabuList
In [ ]: