Learning

pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).

class pyAgrum.BNLearner(filename, inducedTypes=True)
Parameters:
  • source (str or pandas.DataFrame) – the data to learn from

  • missingSymbols (List[str]) – list of string that will be interpreted as missing values (by default : [‘?’])

  • inducedTypes (Bool) – whether BNLearner should try to automatically find the type of each variable

BNLearner(filename,src) -> BNLearner
Parameters:
  • source (str or *pandas.DataFrame) – the data to learn from

  • src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities

  • missingSymbols (List[str]) – list of string that will be interpreted as missing values (by default : [‘?’])

BNLearner(learner) -> BNLearner
Parameters:
  • learner (pyAgrum.BNLearner) – the BNLearner to copy

G2(*args)

G2 computes the G2 statistic and p-value for two columns, given a list of other columns.

Parameters:
  • name1 (str) – the name of the first column

  • name2 (str) – the name of the second column

  • knowing (List[str]) – the list of names of conditioning columns

Returns:

the G2 statistic and the associated p-value as a Tuple

Return type:

Tuple[float,float]

addForbiddenArc(*args)

The arc in parameters won’t be added.

Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Return type:

BNLearner

addMandatoryArc(*args)

Allow to add prior structural knowledge.

Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Raises:

pyAgrum.InvalidDirectedCycle – If the added arc creates a directed cycle in the DAG

Return type:

BNLearner

addPossibleEdge(*args)

assign a new possible edge

Warning

By default, all edge is possible. However, once at least one possible edge is defined, all other edges not declared possible are considered as impossible.

Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Return type:

BNLearner

chi2(*args)

chi2 computes the chi2 statistic and p-value for two columns, given a list of other columns.

Parameters:
  • name1 (str) – the name of the first column

  • name2 (str) – the name of the second column

  • knowing (List[str]) – the list of names of conditioning columns

Returns:

the chi2 statistic and the associated p-value as a Tuple

Return type:

Tuple[float,float]

correctedMutualInformation(*args)

computes the mutual information between two columns, given a list of other columns (log2).

Warning

This function takes into account correction and prior. If you want the ‘raw’ mutual information, use gum.BNLearner.mutualInformation

Parameters:
  • name1 (str) – the name of the first column

  • name2 (str) – the name of the second column

  • knowing (List[str]) – the list of names of conditioning columns

Returns:

the G2 statistic and the associated p-value as a Tuple

Return type:

Tuple[float,float]

currentTime()
Returns:

get the current running time in second (float)

Return type:

float

databaseWeight()

Get the database weight which is given as an equivalent sample size.

Returns:

The weight of the database

Return type:

float

domainSize(*args)
Return type:

int

epsilon()
Returns:

the value of epsilon

Return type:

float

eraseForbiddenArc(*args)

Allow the arc to be added if necessary.

Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Return type:

BNLearner

eraseMandatoryArc(*args)
Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Return type:

BNLearner

erasePossibleEdge(*args)

Allow the 2 arcs to be added if necessary.

Parameters:
  • arc (pyAgrum.Arc) – an arc

  • head – a variable’s id (int)

  • tail – a variable’s id (int)

  • head – a variable’s name (str)

  • tail – a variable’s name (str)

Return type:

BNLearner

fitParameters(bn)

Easy shortcut to LearnParameters method. fitParameters uses self to direcuptly populate the CPTs of bn.0

Parameters:

bn (pyAgrum.BayesNet) – a BN which will directly have its parameters learned.

getNumberOfThreads()
Return type:

int

hasMissingValues()

Indicates whether there are missing values in the database.

Returns:

True if there are some missing values in the database.

Return type:

bool

history()
Returns:

the scheme history

Return type:

tuple

Raises:

pyAgrum.OperationNotAllowed – If the scheme did not performed or if verbosity is set to false

idFromName(var_name)
Parameters:
  • var_names (str) – a variable’s name

  • var_name (str) –

Returns:

the column id corresponding to a variable name

Return type:

int

Raises:

pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database.

isGumNumberOfThreadsOverriden()
Return type:

bool

latentVariables()

Warning

learner must be using 3off2 or MIIC algorithm

Returns:

the list of latent variables

Return type:

list

learnBN()

learn a BayesNet from a file (must have read the db before)

Returns:

the learned BayesNet

Return type:

pyAgrum.BayesNet

learnDAG()

learn a structure from a file

Returns:

the learned DAG

Return type:

pyAgrum.DAG

learnEssentialGraph()
learnMixedGraph()

Deprecated methods in BNLearner for pyAgrum>1.5.2

learnPDAG()
Return type:

PDAG

learnParameters(*args)

learns a BN (its parameters) when its structure is known.

Parameters:
  • dag (pyAgrum.DAG) –

  • bn (pyAgrum.BayesNet) –

  • take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit prior (like K2 which has a 1-smoothing prior), it is important to also take into account this implicit prior for parameter learning. By default, if a score exists, we will learn parameters by taking into account the prior specified by methods usePriorXXX () + the implicit prior of the score, else we just take into account the prior specified by usePriorXXX ()

Returns:

the learned BayesNet

Return type:

pyAgrum.BayesNet

Raises:
  • pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database

  • pyAgrum.UnknownLabelInDatabase – If a label is found in the database that do not correspond to the variable

logLikelihood(*args)

logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)

Parameters:
  • vars (List[str]) – the name of the columns of interest

  • knowing (List[str]) – the (optional) list of names of conditioning columns

Returns:

the log-likelihood (base 2)

Return type:

float

maxIter()
Returns:

the criterion on number of iterations

Return type:

int

maxTime()
Returns:

the timeout(in seconds)

Return type:

float

messageApproximationScheme()
Returns:

the approximation scheme message

Return type:

str

minEpsilonRate()
Returns:

the value of the minimal epsilon rate

Return type:

float

mutualInformation(*args)

computes the mutual information between two columns, given a list of other columns (log2).

Warning

This function gives the ‘raw’ mutual information. If you want a version taking into account correction and prior, use gum.BNLearner.correctedMutualInformation

Parameters:
  • name1 (str) – the name of the first column

  • name2 (str) – the name of the second column

  • knowing (List[str]) – the list of names of conditioning columns

Returns:

the G2 statistic and the associated p-value as a Tuple

Return type:

Tuple[float,float]

nameFromId(id)
Parameters:

id (int) – a node id

Returns:

the variable’s name

Return type:

str

names()
Returns:

the names of the variables in the database

Return type:

Tuple[str]

nbCols()

Return the number of columns in the database

Returns:

the number of columns in the database

Return type:

int

nbRows()

Return the number of row in the database

Returns:

the number of rows in the database

Return type:

int

nbrIterations()
Returns:

the number of iterations

Return type:

int

periodSize()
Returns:

the number of samples between 2 stopping

Return type:

int

Raises:

pyAgrum.OutOfBounds – If p<1

pseudoCount(vars)

access to pseudo-count (priors taken into account)

Parameters:

vars (list[str]) – a list of name of vars to add in the pseudo_count

Return type:

a Potential containing this pseudo-counts

rawPseudoCount(*args)

computes the pseudoCount (taking priors into account) of the list of variables as a list of floats.

Parameters:

vars (List[intstr]) – the list of variables

Returns:

the pseudo-count as a list of float

Return type:

List[float]

recordWeight(i)

Get the weight of the ith record

Parameters:

i (int) – the position of the record in the database

Raises:

pyAgrum.OutOfBounds – if i is outside the set of indices of the records

Returns:

The weight of the ith record of the database

Return type:

float

score(*args)

Returns the value of the score currently in use by the BNLearner of a variable given a set of other variables

Parameters:
  • name1 (str) – the name of the variable at the LHS of the conditioning bar

  • knowing (List[str]) – the list of names of the conditioning variables

Returns:

the value of the score

Return type:

float

setDatabaseWeight(new_weight)

Set the database weight which is given as an equivalent sample size.

Warning

The same weight is assigned to all the rows of the learning database so that the sum of their weights is equal to the value of the parameter weight.

Parameters:
  • weight (float) – the database weight

  • new_weight (float) –

Return type:

None

setEpsilon(eps)
Parameters:

eps (float) – the epsilon we want to use

Raises:

pyAgrum.OutOfBounds – If eps<0

Return type:

None

setForbiddenArcs(set)

assign a set of forbidden arcs

Parameters:
  • arcs (Set[Tuple[intstr,intstr]]) –

  • set (Set[Tuple[int, int]]) –

Return type:

BNLearner

setInitialDAG(dag)
Parameters:

dag (pyAgrum.DAG) – an initial DAG structure

Return type:

BNLearner

setMandatoryArcs(set)

assign a set of mandatory arcs

Parameters:
  • arcs (Set[Tuple[intstr,intstr]]) –

  • set (Set[Tuple[int, int]]) –

Return type:

BNLearner

setMaxIndegree(max_indegree)
Parameters:

max_indegree (int) – the limit number of parents

Return type:

BNLearner

setMaxIter(max)
Parameters:

max (int) – the maximum number of iteration

Raises:

pyAgrum.OutOfBounds – If max <= 1

Return type:

None

setMaxTime(timeout)
Parameters:
  • tiemout (float) – stopping criterion on timeout (in seconds)

  • timeout (float) –

Raises:

pyAgrum.OutOfBounds – If timeout<=0.0

Return type:

None

setMinEpsilonRate(rate)
Parameters:

rate (float) – the minimal epsilon rate

Return type:

None

setNumberOfThreads(nb)

If the parameter n passed in argument is different from 0, the BNLearner will use n threads during learning, hence overriding pyAgrum default number of threads. If, on the contrary, n is equal to 0, the BNLearner will comply with pyAgrum default number of threads.

Parameters:
  • n (int) – the number of threads to be used by the BNLearner

  • nb (int) –

Return type:

None

setPeriodSize(p)
Parameters:

p (int) – number of samples between 2 stopping

Raises:

pyAgrum.OutOfBounds – If p<1

Return type:

None

setPossibleEdges(set)
Parameters:

set (Set[Tuple[int, int]]) –

Return type:

BNLearner

setPossibleSkeleton(skeleton)
Parameters:

skeleton (UndiGraph) –

Return type:

BNLearner

setRecordWeight(i, weight)

Set the weight of the ith record

Parameters:
  • i (int) – the position of the record in the database

  • weight (float) – the weight assigned to this record

Raises:

pyAgrum.OutOfBounds – if i is outside the set of indices of the records

Return type:

None

setSliceOrder(*args)

Set a partial order on the nodes.

Parameters:

l (list) – a list of sequences (composed of ids of rows or string)

Return type:

BNLearner

setVerbosity(v)
Parameters:

v (bool) – verbosity

Return type:

None

state()
Return type:

object

use3off2()

Indicate that we wish to use 3off2.

Return type:

BNLearner

useAprioriBDeu()

Deprecated methods in BNLearner for pyAgrum>1.1.1

useAprioriDirichlet()

Deprecated methods in BNLearner for pyAgrum>1.1.1

useAprioriSmoothing()

Deprecated methods in BNLearner for pyAgrum>1.1.1

useBDeuPrior(weight=1.0)

The BDeu prior adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.

Parameters:

weight (float) – the prior weight

Return type:

BNLearner

useDirichletPrior(*args)

Use the Dirichlet prior.

Parameters:
  • source (str|pyAgrum.BayesNet) – the Dirichlet related source (filename of a database or a Bayesian network)

  • weight (float (optional)) – the weight of the prior (the ‘size’ of the corresponding ‘virtual database’)

Return type:

BNLearner

useEM(epsilon)

Indicates if we use EM for parameter learning.

Parameters:

epsilon (float) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.

Return type:

BNLearner

useGreedyHillClimbing()

Indicate that we wish to use a greedy hill climbing algorithm.

Return type:

BNLearner

useK2(*args)

Indicate to use the K2 algorithm (which needs a total ordering of the variables).

Parameters:

order (list[int or str]) – sequences of (ids or name)

Return type:

BNLearner

useLocalSearchWithTabuList(tabu_size=100, nb_decrease=2)

Indicate that we wish to use a local search with tabu list

Parameters:
  • tabu_size (int) – The size of the tabu list

  • nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply

Return type:

BNLearner

useMDLCorrection()

Indicate that we wish to use the MDL correction for 3off2 or MIIC

Return type:

BNLearner

useMIIC()

Indicate that we wish to use MIIC.

Return type:

BNLearner

useNMLCorrection()

Indicate that we wish to use the NML correction for 3off2 or MIIC

Return type:

BNLearner

useNoApriori()

Deprecated methods in BNLearner for pyAgrum>1.1.1

useNoCorrection()

Indicate that we wish to use the NoCorr correction for 3off2 or MIIC

Return type:

BNLearner

useNoPrior()

Use no prior.

Return type:

BNLearner

useScoreAIC()

Indicate that we wish to use an AIC score.

Return type:

BNLearner

useScoreBD()

Indicate that we wish to use a BD score.

Return type:

BNLearner

useScoreBDeu()

Indicate that we wish to use a BDeu score.

Return type:

BNLearner

useScoreBIC()

Indicate that we wish to use a BIC score.

Return type:

BNLearner

useScoreK2()

Indicate that we wish to use a K2 score.

Return type:

BNLearner

useScoreLog2Likelihood()

Indicate that we wish to use a Log2Likelihood score.

Return type:

BNLearner

useSmoothingPrior(weight=1)

Use the prior smoothing.

Parameters:

weight (float) – pass in argument a weight if you wish to assign a weight to the smoothing, otherwise the current weight of the learner will be used.

Return type:

BNLearner

verbosity()
Returns:

True if the verbosity is enabled

Return type:

bool