Learning
pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and their parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).
BNLearner allows to choose : - the learning algorithm (MIIC, Greedy Hill Climbing, K2, etc.) - the parameters learning algorithm (including EM), - the score (BDeu, AIC, etc.) for score-based algorithms - the prior (smoothing, Dirichlet, etc.) - the constraints (for instance, forbid some arcs, propose a partial order in the variables, etc.) - the correction (NML, etc) for MIIC algorithm - and many low-level function such as computing the chi2, G2 score or the conditional likelihood on the database, etc.
BNLearner is able to learn a Bayesian network from a database (a pandas.DataFrame) or from a csv file.
- class pyAgrum.BNLearner(*args)
This class provides functionality for learning Bayesian Networks from data.
- BNLearner(filename,inducedTypes=True) -> BNLearner
- Parameters:
source (str or pandas.DataFrame) – the data to learn from
missingSymbols (List[str]) – list of strings that will be interpreted as missing values (by default : ?)
inducedTypes (Bool) – whether BNLearner should try to automatically find the type of each variable
- BNLearner(filename,src) -> BNLearner
- Parameters:
source (str or pandas.DataFrame) – the data to learn from
src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
missingSymbols (List[str]) – list of strings that will be interpreted as missing values (by default : ?)
- BNLearner(learner) -> BNLearner
- Parameters:
learner (pyAgrum.BNLearner) – the BNLearner to copy
- G2(*args)
G2 computes the G2 statistic and p-value for two columns, given a list of other columns.
- Parameters:
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns:
the G2 statistic and the associated p-value as a Tuple
- Return type:
Tuple[float,float]
- addForbiddenArc(*args)
The arc in parameters won’t be added.
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Return type:
- addMandatoryArc(*args)
Allow to add prior structural knowledge.
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Raises:
pyAgrum.InvalidDirectedCycle – If the added arc creates a directed cycle in the DAG
- Return type:
- addPossibleEdge(*args)
assign a new possible edge
Warning
By default, all edge is possible. However, once at least one possible edge is defined, all other edges not declared possible are considered as impossible.
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Return type:
- chi2(*args)
chi2 computes the chi2 statistic and p-value for two columns, given a list of other columns.
- Parameters:
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns:
the chi2 statistic and the associated p-value as a Tuple
- Return type:
Tuple[float,float]
- correctedMutualInformation(*args)
computes the mutual information between two columns, given a list of other columns (log2).
Warning
This function takes into account correction and prior. If you want the ‘raw’ mutual information, use pyAgrum.BNLearner.mutualInformation
- Parameters:
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns:
the G2 statistic and the associated p-value as a Tuple
- Return type:
Tuple[float,float]
- currentTime()
- Returns:
get the current running time in second (float)
- Return type:
float
- databaseWeight()
Get the database weight which is given as an equivalent sample size.
- Returns:
The weight of the database
- Return type:
float
- domainSize(*args)
Return the domain size of the variable with the given name.
- Parameters:
n (str | int) – the name of the id of the variable
- Return type:
int
- epsilon()
- Returns:
the value of epsilon
- Return type:
float
- eraseForbiddenArc(*args)
Allow the arc to be added if necessary.
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Return type:
- eraseMandatoryArc(*args)
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Return type:
- erasePossibleEdge(*args)
Allow the 2 arcs to be added if necessary.
- Parameters:
arc (pyAgrum.Arc) – an arc
head (int str) – a variable’s id or name
tail (int str) – a variable’s id or name
- Return type:
- fitParameters(bn, take_into_account_score=True)
fitParameters directly populates the CPTs of the argument using the database and the structure of the BN.
- Parameters:
bn (pyAgrum.BayesNet) – a BN which will directly have its parameters learned inplace.
take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit prior (like K2 which has a 1-smoothing prior), it is important to also take into account this implicit prior for parameter learning. By default (take_into_account_score=True), we will learn parameters by taking into account the prior specified by methods usePriorXXX () + the implicit prior of the score (if any). If take_into_account_score=False, we just take into account the prior specified by usePriorXXX().
- getNumberOfThreads()
Return the number of threads used by the BNLearner during structure and parameter learning.
- Returns:
the number of threads used by the BNLearner during structure and parameter learning
- Return type:
int
- hasMissingValues()
Indicates whether there are missing values in the database.
- Returns:
True if there are some missing values in the database.
- Return type:
bool
- history()
- Returns:
the scheme history
- Return type:
tuple
- Raises:
pyAgrum.OperationNotAllowed – If the scheme did not performed or if verbosity is set to false
- idFromName(var_name)
- Parameters:
var_names (str) – a variable’s name
var_name (
str
)
- Returns:
the column id corresponding to a variable name
- Return type:
int
- Raises:
pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database.
- isGumNumberOfThreadsOverriden()
Check if the number of threads use by the learner is the default one or not.
- Returns:
True if the number of threads used by the BNLearner has been set.
- Return type:
bool
- latentVariables()
Warning
learner must be using MIIC algorithm
- Returns:
the list of latent variables
- Return type:
list
- learnBN()
learn a BayesNet from a file (must have read the db before)
- Returns:
the learned BayesNet
- Return type:
- learnDAG()
learn a structure from a file
- Returns:
the learned DAG
- Return type:
- learnEssentialGraph()
learn an essential graph from a file
- Returns:
the learned essential graph
- Return type:
- learnMixedGraph()
Deprecated methods in BNLearner for pyAgrum>1.5.2
- learnPDAG()
learn a PDAG from a file
Warning
The learning method must be constraint-based (MIIC, etc.) and not score-based (K2, GreedyHillClimbing, etc.)
- Returns:
the learned PDAG
- Return type:
- learnParameters(*args)
Create a new BN copying its structure from the argument (dag or BN) and learning its parameters from the database w.r.t the BNLearner’s state (priors, etc.).
Warning
When using a pyAgrum.DAG as input parameter, NodeIds in the dag and index of rows in the database must fit in order to coherently fix the structure of the BN. Generally, it is safer to use a pyAgrum.BayesianNet as input or even to use pyAgrum.BNLearner.fitParameters.
- Parameters:
dag (pyAgrum.DAG)
bn (pyAgrum.BayesNet)
take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit prior (like K2 which has a 1-smoothing prior), it is important to also take into account this implicit prior for parameter learning. By default (take_into_account_score=True), we will learn parameters by taking into account the prior specified by methods usePriorXXX () + the implicit prior of the score (if any). If take_into_account_score=False, we just take into account the prior specified by usePriorXXX().
- Returns:
the learned BayesNet
- Return type:
- Raises:
pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database
pyAgrum.UnknownLabelInDatabase – If a label is found in the database that do not correspond to the variable
- logLikelihood(*args)
logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)
- Parameters:
vars (List[str]) – the name of the columns of interest
knowing (List[str]) – the (optional) list of names of conditioning columns
- Returns:
the log-likelihood (base 2)
- Return type:
float
- maxIter()
- Returns:
the criterion on number of iterations
- Return type:
int
- maxTime()
- Returns:
the timeout(in seconds)
- Return type:
float
- messageApproximationScheme()
- Returns:
the approximation scheme message
- Return type:
str
- minEpsilonRate()
- Returns:
the value of the minimal epsilon rate
- Return type:
float
- mutualInformation(*args)
computes the (log2) mutual information between two columns, given a list of other columns.
Warning
This function gives the ‘raw’ mutual information. If you want a version taking into account correction and prior, use pyAgrum.BNLearner.correctedMutualInformation
- Parameters:
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns:
the log2 mutual information
- Return type:
float
- nameFromId(id)
- Parameters:
id (
int
) – a node id- Returns:
the variable’s name
- Return type:
str
- names()
- Returns:
the names of the variables in the database
- Return type:
Tuple[str]
- nbCols()
Return the number of columns in the database
- Returns:
the number of columns in the database
- Return type:
int
- nbRows()
Return the number of row in the database
- Returns:
the number of rows in the database
- Return type:
int
- nbrIterations()
- Returns:
the number of iterations
- Return type:
int
- periodSize()
- Returns:
the number of samples between 2 stopping
- Return type:
int
- Raises:
pyAgrum.OutOfBounds – If p<1
- pseudoCount(vars)
access to pseudo-count (priors taken into account)
- Parameters:
vars (list[str]) – a list of name of vars to add in the pseudo_count
- Return type:
a Potential containing this pseudo-counts
- rawPseudoCount(*args)
computes the pseudoCount (taking priors into account) of the list of variables as a list of floats.
- Parameters:
vars (List[intstr]) – the list of variables
- Returns:
the pseudo-count as a list of float
- Return type:
List[float]
- recordWeight(i)
Get the weight of the ith record
- Parameters:
i (int) – the position of the record in the database
- Raises:
pyAgrum.OutOfBounds – if i is outside the set of indices of the records
- Returns:
The weight of the ith record of the database
- Return type:
float
- score(*args)
Returns the value of the score currently in use by the BNLearner of a variable given a set of other variables
- Parameters:
name1 (str) – the name of the variable at the LHS of the conditioning bar
knowing (List[str]) – the list of names of the conditioning variables
- Returns:
the value of the score
- Return type:
float
- setDatabaseWeight(new_weight)
Set the database weight which is given as an equivalent sample size.
Warning
The same weight is assigned to all the rows of the learning database so that the sum of their weights is equal to the value of the parameter weight.
- Parameters:
weight (float) – the database weight
new_weight (
float
)
- Return type:
None
- setEpsilon(eps)
- Parameters:
eps (float) – the epsilon we want to use
- Raises:
pyAgrum.OutOfBounds – If eps<0
- Return type:
None
- setInitialDAG(dag)
- Parameters:
dag (pyAgrum.DAG) – an initial DAG structure
- Return type:
- setMaxIndegree(max_indegree)
- Parameters:
max_indegree (int) – the limit number of parents
- Return type:
- setMaxIter(max)
- Parameters:
max (int) – the maximum number of iteration
- Raises:
pyAgrum.OutOfBounds – If max <= 1
- Return type:
None
- setMaxTime(timeout)
- Parameters:
tiemout (float) – stopping criterion on timeout (in seconds)
timeout (
float
)
- Raises:
pyAgrum.OutOfBounds – If timeout<=0.0
- Return type:
None
- setMinEpsilonRate(rate)
- Parameters:
rate (float) – the minimal epsilon rate
- Return type:
None
- setNumberOfThreads(nb)
If the parameter n passed in argument is different from 0, the BNLearner will use n threads during learning, hence overriding pyAgrum default number of threads. If, on the contrary, n is equal to 0, the BNLearner will comply with pyAgrum default number of threads.
- Parameters:
n (int) – the number of threads to be used by the BNLearner
nb (
int
)
- Return type:
None
- setPeriodSize(p)
- Parameters:
p (int) – number of samples between 2 stopping
- Raises:
pyAgrum.OutOfBounds – If p<1
- Return type:
None
- setPossibleEdges(*args)
Add a constraint by fixing the set of possible edges.
- Parameters:
edges (Set[Tuple[int]]) – a set of edges as couples of nodeIds.
- Return type:
None
- setPossibleSkeleton(skeleton)
Add a constraint by fixing the set of possible edges as a pyAgrum.UndiGraph.
- Parameters:
g (pyAgrum.UndiGraph) – the fixed skeleton
skeleton (
UndiGraph
)
- Return type:
- setRecordWeight(i, weight)
Set the weight of the ith record
- Parameters:
i (int) – the position of the record in the database
weight (float) – the weight assigned to this record
- Raises:
pyAgrum.OutOfBounds – if i is outside the set of indices of the records
- Return type:
None
- setSliceOrder(*args)
Set a partial order on the nodes.
- Parameters:
l (list) – a list of sequences (composed of ids of rows or string)
- Return type:
- setVerbosity(v)
- Parameters:
v (bool) – verbosity
- Return type:
None
- state()
Returns a dictionary containing the current state of the BNLearner.
- Returns:
a dictionary containing the current state of the BNLearner.
- Return type:
Dict[str,Any]
- useBDeuPrior(weight=1.0)
The BDeu prior adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.
- Parameters:
weight (float) – the prior weight
- Return type:
- useDirichletPrior(*args)
Use the Dirichlet prior.
- Parameters:
source (str|pyAgrum.BayesNet) – the Dirichlet related source (filename of a database or a Bayesian network)
weight (float (optional)) – the weight of the prior (the ‘size’ of the corresponding ‘virtual database’)
- Return type:
- useEM(epsilon)
Indicates if we use EM for parameter learning.
- Parameters:
epsilon (float) – if epsilon=0.0 then EM is not used. if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is less than epsilon.
- Return type:
- useGreedyHillClimbing()
Indicate that we wish to use a greedy hill climbing algorithm.
- Return type:
- useK2(*args)
Indicate to use the K2 algorithm (which needs a total ordering of the variables).
- Parameters:
order (list[int or str]) – sequences of (ids or name)
- Return type:
- useLocalSearchWithTabuList(tabu_size=100, nb_decrease=2)
Indicate that we wish to use a local search with tabu list
- Parameters:
tabu_size (int) – The size of the tabu list
nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
- Return type:
- useNoCorrection()
Indicate that we wish to use the NoCorr correction for MIIC
- Return type:
- useScoreLog2Likelihood()
Indicate that we wish to use a Log2Likelihood score.
- Return type:
- useSmoothingPrior(weight=1)
Use the prior smoothing.
- Parameters:
weight (float) – pass in argument a weight if you wish to assign a weight to the smoothing, otherwise the current weight of the learner will be used.
- Return type:
- verbosity()
- Returns:
True if the verbosity is enabled
- Return type:
bool