Learning

pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).

class pyAgrum.BNLearner(filename, inducedTypes=True)
Parameters:
  • filename (str) – the file to learn from
  • inducedTypes (Bool) – whether BNLearner should try to automatically find the type of each variable
BNLearner(filename,src) -> BNLearner
Parameters:
  • filename (str) – the file to learn from
  • src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
BNLearner(learner) -> BNLearner
Parameters:
  • learner (pyAgrum.BNLearner) – the BNLearner to copy
G2(self, var1, var2, knw={})

G2 computes the G2 statistic and pvalue for two columns, given a list of other columns.

Parameters:
  • name1 (str) – the name of the first column
  • name2 (str) – the name of the second column
  • knowing ([str]) – the list of names of conditioning columns
Returns:

the G2 statistic and the associated p-value as a Tuple

Return type:

statistic,pvalue

addForbiddenArc(self, arc)

addForbiddenArc(self, tail, head) addForbiddenArc(self, tail, head)

The arc in parameters won’t be added.

Parameters:
  • arc (pyAgrum.Arc) – an arc
  • head – a variable’s id (int)
  • tail – a variable’s id (int)
  • head – a variable’s name (str)
  • tail – a variable’s name (str)
addMandatoryArc(self, arc)

addMandatoryArc(self, tail, head) addMandatoryArc(self, tail, head)

Allow to add prior structural knowledge.

Parameters:
  • arc (pyAgrum.Arc) – an arc
  • head – a variable’s id (int)
  • tail – a variable’s id (int)
  • head – a variable’s name (str)
  • tail – a variable’s name (str)
Raises:

gum.InvalidDirectedCycle – If the added arc creates a directed cycle in the DAG

addPossibleEdge(self, edge)

addPossibleEdge(self, tail, head) addPossibleEdge(self, tail, head)

chi2(self, var1, var2, knw={})

chi2 computes the chi2 statistic and pvalue for two columns, given a list of other columns.

Parameters:
  • name1 (str) – the name of the first column
  • name2 (str) – the name of the second column
  • knowing ([str]) – the list of names of conditioning columns
Returns:

the chi2 statistic and the associated p-value as a Tuple

Return type:

statistic,pvalue

currentTime(self)
Returns:get the current running time in second (double)
Return type:double
databaseWeight(self)
domainSize(self, var)

domainSize(self, var) -> int

epsilon(self)
Returns:the value of epsilon
Return type:double
eraseForbiddenArc(self, arc)

eraseForbiddenArc(self, tail, head) eraseForbiddenArc(self, tail, head)

Allow the arc to be added if necessary.

Parameters:
  • arc (pyAgrum) – an arc
  • head – a variable’s id (int)
  • tail – a variable’s id (int)
  • head – a variable’s name (str)
  • tail – a variable’s name (str)
eraseMandatoryArc(self, arc)

eraseMandatoryArc(self, tail, head) eraseMandatoryArc(self, tail, head)

Parameters:
  • arc (pyAgrum) – an arc
  • head – a variable’s id (int)
  • tail – a variable’s id (int)
  • head – a variable’s name (str)
  • tail – a variable’s name (str)
erasePossibleEdge(self, edge)

erasePossibleEdge(self, tail, head) erasePossibleEdge(self, tail, head)

Allow the 2 arcs to be added if necessary.

Parameters:
  • arc (pyAgrum) – an arc
  • head – a variable’s id (int)
  • tail – a variable’s id (int)
  • head – a variable’s name (str)
  • tail – a variable’s name (str)
hasMissingValues(self)

Indicates whether there are missing values in the database.

Returns:True if there are some missing values in the database.
Return type:bool
history(self)
Returns:the scheme history
Return type:tuple
Raises:gum.OperationNotAllowed – If the scheme did not performed or if verbosity is set to false
idFromName(self, var_name)
Parameters:var_names (str) – a variable’s name
Returns:the column id corresponding to a variable name
Return type:int
Raises:gum.MissingVariableInDatabase – If a variable of the BN is not found in the database.
latentVariables(self)

latentVariables(self) -> vector< pyAgrum.Arc,allocator< pyAgrum.Arc > > const

Warning

learner must be using 3off2 or MIIC algorithm

Returns:the list of latent variables
Return type:list
learnBN(self)

learn a BayesNet from a file (must have read the db before)

Returns:the learned BayesNet
Return type:pyAgrum.BayesNet
learnDAG(self)

learn a structure from a file

Returns:the learned DAG
Return type:pyAgrum.DAG
learnMixedStructure(self)

Warning

learner must be using 3off2 or MIIC algorithm

Returns:the learned structure as an EssentialGraph
Return type:pyAgrum.EssentialGraph
learnParameters(self, dag, takeIntoAccountScore=True)

learnParameters(self, take_into_account_score=True) -> BayesNet

learns a BN (its parameters) when its structure is known.

Parameters:
  • dag (pyAgrum.DAG) –
  • bn (pyAgrum.BayesNet) –
  • take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit apriori (like K2 which has a 1-smoothing apriori), it is important to also take into account this implicit apriori for parameter learning. By default, if a score exists, we will learn parameters by taking into account the apriori specified by methods useAprioriXXX () + the implicit apriori of the score, else we just take into account the apriori specified by useAprioriXXX ()
Returns:

the learned BayesNet

Return type:

pyAgrum.BayesNet

Raises:
  • gum.MissingVariableInDatabase – If a variable of the BN is not found in the database
  • gum.UnknownLabelInDatabase – If a label is found in the database that do not correspond to the variable
logLikelihood(self, vars, knowing={})

logLikelihood(self, vars) -> double logLikelihood(self, vars, knowing={}) -> double logLikelihood(self, vars) -> double

logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)

Parameters:
  • vars (List[str]) – the name of the columns of interest
  • knowing (List[str]) – the (optional) list of names of conditioning columns
Returns:

the log-likelihood (base 2)

Return type:

double

maxIter(self)
Returns:the criterion on number of iterations
Return type:int
maxTime(self)
Returns:the timeout(in seconds)
Return type:double
messageApproximationScheme(self)
Returns:the approximation scheme message
Return type:str
minEpsilonRate(self)
Returns:the value of the minimal epsilon rate
Return type:double
nameFromId(self, id)
Parameters:id – a node id
Returns:the variable’s name
Return type:str
names(self)
Returns:the names of the variables in the database
Return type:List[str]
nbCols(self)

Return the nimber of columns in the database

Returns:the number of columns in the database
Return type:int
nbRows(self)

Return the number of row in the database

Returns:the number of rows in the database
Return type:int
nbrIterations(self)
Returns:the number of iterations
Return type:int
periodSize(self)
Returns:the number of samples between 2 stopping
Return type:int
Raises:gum.OutOfBounds – If p<1
pseudoCount(vars)

access to pseudo-count (priors taken into account)

Parameters:vars (list[str]) – a list of name of vars to add in the pseudo_count
Returns:
Return type:a Potential containing this pseudo-counts
rawPseudoCount(self, vars)

rawPseudoCount(self, vars) -> Vector

recordWeight(self, i)
setAprioriWeight(weight)

Deprecated methods in BNLearner for pyAgrum>0.14.0

setDatabaseWeight(self, new_weight)

Set the database weight which is given as an equivalent sample size.

Parameters:weight (double) – the database weight
setEpsilon(self, eps)
Parameters:eps (double) – the epsilon we want to use
Raises:gum.OutOfBounds – If eps<0
setInitialDAG(self, g)
Parameters:dag (pyAgrum.DAG) – an initial DAG structure
setMaxIndegree(self, max_indegree)
setMaxIter(self, max)
Parameters:max (int) – the maximum number of iteration
Raises:gum.OutOfBounds – If max <= 1
setMaxTime(self, timeout)
Parameters:tiemout (double) – stopping criterion on timeout (in seconds)
Raises:gum.OutOfBounds – If timeout<=0.0
setMinEpsilonRate(self, rate)
Parameters:rate (double) – the minimal epsilon rate
setPeriodSize(self, p)
Parameters:p (int) – number of samples between 2 stopping
Raises:gum.OutOfBounds – If p<1
setPossibleSkeleton(self, skeleton)
setRecordWeight(self, i, weight)
setSliceOrder(self, l)

setSliceOrder(self, slice_order) setSliceOrder(self, slices)

Set a partial order on the nodes.

Parameters:l (list) – a list of sequences (composed of ids of rows or string)
setVerbosity(self, v)
Parameters:v (bool) – verbosity
state(self)
use3off2(self)

Indicate that we wish to use 3off2.

useAprioriBDeu(self, weight=1)

useAprioriBDeu(self)

The BDeu apriori adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.

Parameters:weight (double) – the apriori weight
useAprioriDirichlet(self, filename, weight=1)

useAprioriDirichlet(self, filename)

useAprioriSmoothing(self, weight=1)

useAprioriSmoothing(self)

useEM(self, epsilon)

Indicates if we use EM for parameter learning.

Parameters:epsilon (double) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.
useGreedyHillClimbing(self)
useK2(self, l)

useK2(self, order) useK2(self, order)

Indicate to use the K2 algorithm (which needs a total ordering of the variables).

Parameters:order (list[int or str]) – sequences of (ids or name)
useLocalSearchWithTabuList(self, tabu_size=100, nb_decrease=2)

useLocalSearchWithTabuList(self, tabu_size=100) useLocalSearchWithTabuList(self)

Indicate that we wish to use a local search with tabu list

Parameters:
  • tabu_size (int) – The size of the tabu list
  • nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
useMDLCorrection(self)

Indicate that we wish to use the MDL correction for 3off2 or MIIC

useMIIC(self)

Indicate that we wish to use MIIC.

useNMLCorrection(self)

Indicate that we wish to use the NML correction for 3off2 or MIIC

useNoApriori(self)
useNoCorrection(self)

Indicate that we wish to use the NoCorr correction for 3off2 or MIIC

useScoreAIC(self)
useScoreBD(self)
useScoreBDeu(self)
useScoreBIC(self)
useScoreK2(self)
useScoreLog2Likelihood(self)
verbosity(self)
Returns:True if the verbosity is enabled
Return type:bool