Learning essential graphs¶

In [1]:

%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

import os

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb

Compare learning algorithms¶

Essentially MIIC and 3off2 computes the essential graph (CPDAG) from data. Essential graphs are mixed graphs.

In [2]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.use3off2()
learner.useNMLCorrection()
print(learner)
ge3off2=learner.learnEssentialGraph()

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : 3off2
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -

In [3]:

gnb.showDot(ge3off2.toDot());

../_images/notebooks_33-Learning_LearningAndEssentialGraphs_5_0.svg

In [4]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useMIIC()
learner.useNMLCorrection()
print(learner)
gemiic=learner.learnEssentialGraph()
gemiic

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : MIIC
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -

Out[4]:

For the others methods, it is possible to obtain the essential graph from the learned BN.

In [5]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useGreedyHillClimbing()
bnHC=learner.learnBN()
print(learner)
geHC=gum.EssentialGraph(bnHC)
geHC
gnb.sideBySide(bnHC,geHC)

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Greedy Hill Climbing
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -

In [6]:

learner=gum.BNLearner("out/sample_asia.csv")
learner.useLocalSearchWithTabuList()
print(learner)
bnTL=learner.learnBN()
geTL=gum.EssentialGraph(bnTL)
geTL
gnb.sideBySide(bnTL,geTL)

Filename       : out/sample_asia.csv
Size           : (500000,8)
Variables      : visit_to_Asia[2], positive_XraY[2], tuberculosis[2], lung_cancer[2], dyspnoea[2], bronchitis[2], tuberculos_or_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Local Search with Tabu List
Tabu list size : 2
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -

Hence we can compare the 4 algorithms.

In [7]:

(
  gnb.flow.clear()
  .add(ge3off2,"Essential graph from 3off2")
  .add(gemiic,"Essential graph from miic")
  .add(bnHC,"BayesNet from GHC")
  .add(geHC,"Essential graph from GHC")
  .add(bnTL,"BayesNet from TabuList")
  .add(geTL,"Essential graph from TabuList")
  .display()
)

Essential graph from 3off2

Essential graph from miic

BayesNet from GHC

Essential graph from GHC

BayesNet from TabuList

Essential graph from TabuList

In [ ]: