Learning
pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).
- class pyAgrum.BNLearner(filename, inducedTypes=True)
- Parameters:
source (str or pandas.DataFrame) – the data to learn from
missingSymbols (List[str]) – list of string that will be interpreted as missing values (by default : [‘?’])
inducedTypes (Bool) – whether BNLearner should try to automatically find the type of each variable
- BNLearner(filename,src) -> BNLearner
- Parameters:
source (str or *pandas.DataFrame) – the data to learn from
src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
missingSymbols (List[str]) – list of string that will be interpreted as missing values (by default : [‘?’])
- BNLearner(learner) -> BNLearner
- Parameters:
learner (pyAgrum.BNLearner) – the BNLearner to copy
- G2(*args)
G2 computes the G2 statistic and p-value for two columns, given a list of other columns.
- Parameters
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns
the G2 statistic and the associated p-value as a Tuple
- Return type
Tuple[float,float]
- addForbiddenArc(*args)
The arc in parameters won’t be added.
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Return type
- addMandatoryArc(*args)
Allow to add prior structural knowledge.
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Raises
pyAgrum.InvalidDirectedCycle – If the added arc creates a directed cycle in the DAG
- Return type
- addPossibleEdge(*args)
assign a new possible edge
Warning
By default, all edge is possible. However, once at least one possible edge is defined, all other edges not declared possible are considered as impossible.
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Return type
- chi2(*args)
chi2 computes the chi2 statistic and p-value for two columns, given a list of other columns.
- Parameters
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns
the chi2 statistic and the associated p-value as a Tuple
- Return type
Tuple[float,float]
- correctedMutualInformation(*args)
computes the mutual information between two columns, given a list of other columns (log2).
Warning
This function takes into account correction and prior. If you want the ‘raw’ mutual information, use gum.BNLearner.mutualInformation
- Parameters
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns
the G2 statistic and the associated p-value as a Tuple
- Return type
Tuple[float,float]
- currentTime()
- Returns
get the current running time in second (float)
- Return type
float
- databaseWeight()
Get the database weight which is given as an equivalent sample size.
- Returns
The weight of the database
- Return type
float
- domainSize(*args)
- Return type
int
- epsilon()
- Returns
the value of epsilon
- Return type
float
- eraseForbiddenArc(*args)
Allow the arc to be added if necessary.
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Return type
- eraseMandatoryArc(*args)
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Return type
- erasePossibleEdge(*args)
Allow the 2 arcs to be added if necessary.
- Parameters
arc (pyAgrum.Arc) – an arc
head – a variable’s id (int)
tail – a variable’s id (int)
head – a variable’s name (str)
tail – a variable’s name (str)
- Return type
- fitParameters(bn)
Easy shortcut to LearnParameters method. fitParameters uses self to direcuptly populate the CPTs of bn.0
- Parameters
bn (pyAgrum.BayesNet) – a BN which will directly have its parameters learned.
- getNumberOfThreads()
- Return type
int
- hasMissingValues()
Indicates whether there are missing values in the database.
- Returns
True if there are some missing values in the database.
- Return type
bool
- history()
- Returns
the scheme history
- Return type
tuple
- Raises
pyAgrum.OperationNotAllowed – If the scheme did not performed or if verbosity is set to false
- idFromName(var_name)
- Parameters
var_names (str) – a variable’s name
var_name (
str
) –
- Returns
the column id corresponding to a variable name
- Return type
int
- Raises
pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database.
- isGumNumberOfThreadsOverriden()
- Return type
bool
- latentVariables()
Warning
learner must be using 3off2 or MIIC algorithm
- Returns
the list of latent variables
- Return type
list
- learnBN()
learn a BayesNet from a file (must have read the db before)
- Returns
the learned BayesNet
- Return type
- learnDAG()
learn a structure from a file
- Returns
the learned DAG
- Return type
- learnEssentialGraph()
- learnMixedGraph()
Deprecated methods in BNLearner for pyAgrum>1.5.2
- learnParameters(*args)
learns a BN (its parameters) when its structure is known.
- Parameters
dag (pyAgrum.DAG) –
bn (pyAgrum.BayesNet) –
take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit prior (like K2 which has a 1-smoothing prior), it is important to also take into account this implicit prior for parameter learning. By default, if a score exists, we will learn parameters by taking into account the prior specified by methods usePriorXXX () + the implicit prior of the score, else we just take into account the prior specified by usePriorXXX ()
- Returns
the learned BayesNet
- Return type
- Raises
pyAgrum.MissingVariableInDatabase – If a variable of the BN is not found in the database
pyAgrum.UnknownLabelInDatabase – If a label is found in the database that do not correspond to the variable
- logLikelihood(*args)
logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)
- Parameters
vars (List[str]) – the name of the columns of interest
knowing (List[str]) – the (optional) list of names of conditioning columns
- Returns
the log-likelihood (base 2)
- Return type
float
- maxIter()
- Returns
the criterion on number of iterations
- Return type
int
- maxTime()
- Returns
the timeout(in seconds)
- Return type
float
- messageApproximationScheme()
- Returns
the approximation scheme message
- Return type
str
- minEpsilonRate()
- Returns
the value of the minimal epsilon rate
- Return type
float
- mutualInformation(*args)
computes the mutual information between two columns, given a list of other columns (log2).
Warning
This function gives the ‘raw’ mutual information. If you want a version taking into account correction and prior, use gum.BNLearner.correctedMutualInformation
- Parameters
name1 (str) – the name of the first column
name2 (str) – the name of the second column
knowing (List[str]) – the list of names of conditioning columns
- Returns
the G2 statistic and the associated p-value as a Tuple
- Return type
Tuple[float,float]
- nameFromId(id)
- Parameters
id (
int
) – a node id- Returns
the variable’s name
- Return type
str
- names()
- Returns
the names of the variables in the database
- Return type
Tuple[str]
- nbCols()
Return the number of columns in the database
- Returns
the number of columns in the database
- Return type
int
- nbRows()
Return the number of row in the database
- Returns
the number of rows in the database
- Return type
int
- nbrIterations()
- Returns
the number of iterations
- Return type
int
- periodSize()
- Returns
the number of samples between 2 stopping
- Return type
int
- Raises
pyAgrum.OutOfBounds – If p<1
- pseudoCount(vars)
access to pseudo-count (priors taken into account)
- Parameters
vars (list[str]) – a list of name of vars to add in the pseudo_count
- Return type
a Potential containing this pseudo-counts
- rawPseudoCount(*args)
computes the pseudoCount (taking priors into account) of the list of variables as a list of floats.
- Parameters
vars (List[intstr]) – the list of variables
- Returns
the pseudo-count as a list of float
- Return type
List[float]
- recordWeight(i)
Get the weight of the ith record
- Parameters
i (int) – the position of the record in the database
- Raises
pyAgrum.OutOfBounds – if i is outside the set of indices of the records
- Returns
The weight of the ith record of the database
- Return type
float
- score(*args)
Returns the value of the score currently in use by the BNLearner of a variable given a set of other variables
- Parameters
name1 (str) – the name of the variable at the LHS of the conditioning bar
knowing (List[str]) – the list of names of the conditioning variables
- Returns
the value of the score
- Return type
float
- setDatabaseWeight(new_weight)
Set the database weight which is given as an equivalent sample size.
Warning
The same weight is assigned to all the rows of the learning database so that the sum of their weights is equal to the value of the parameter weight.
- Parameters
weight (float) – the database weight
new_weight (
float
) –
- Return type
None
- setEpsilon(eps)
- Parameters
eps (float) – the epsilon we want to use
- Raises
pyAgrum.OutOfBounds – If eps<0
- Return type
None
- setForbiddenArcs(set)
assign a set of forbidden arcs
- Parameters
arcs (Set[Tuple[intstr,intstr]]) –
set (
Set
[Tuple
[int
,int
]]) –
- Return type
- setInitialDAG(dag)
- Parameters
dag (pyAgrum.DAG) – an initial DAG structure
- Return type
- setMandatoryArcs(set)
assign a set of mandatory arcs
- Parameters
arcs (Set[Tuple[intstr,intstr]]) –
set (
Set
[Tuple
[int
,int
]]) –
- Return type
- setMaxIndegree(max_indegree)
- Parameters
max_indegree (int) – the limit number of parents
- Return type
- setMaxIter(max)
- Parameters
max (int) – the maximum number of iteration
- Raises
pyAgrum.OutOfBounds – If max <= 1
- Return type
None
- setMaxTime(timeout)
- Parameters
tiemout (float) – stopping criterion on timeout (in seconds)
timeout (
float
) –
- Raises
pyAgrum.OutOfBounds – If timeout<=0.0
- Return type
None
- setMinEpsilonRate(rate)
- Parameters
rate (float) – the minimal epsilon rate
- Return type
None
- setNumberOfThreads(nb)
If the parameter n passed in argument is different from 0, the BNLearner will use n threads during learning, hence overriding pyAgrum default number of threads. If, on the contrary, n is equal to 0, the BNLearner will comply with pyAgrum default number of threads.
- Parameters
n (int) – the number of threads to be used by the BNLearner
nb (
int
) –
- Return type
None
- setPeriodSize(p)
- Parameters
p (int) – number of samples between 2 stopping
- Raises
pyAgrum.OutOfBounds – If p<1
- Return type
None
- setRecordWeight(i, weight)
Set the weight of the ith record
- Parameters
i (int) – the position of the record in the database
weight (float) – the weight assigned to this record
- Raises
pyAgrum.OutOfBounds – if i is outside the set of indices of the records
- Return type
None
- setSliceOrder(*args)
Set a partial order on the nodes.
- Parameters
l (list) – a list of sequences (composed of ids of rows or string)
- Return type
- setVerbosity(v)
- Parameters
v (bool) – verbosity
- Return type
None
- state()
- Return type
object
- useAprioriBDeu()
Deprecated methods in BNLearner for pyAgrum>1.1.1
- useAprioriDirichlet()
Deprecated methods in BNLearner for pyAgrum>1.1.1
- useAprioriSmoothing()
Deprecated methods in BNLearner for pyAgrum>1.1.1
- useBDeuPrior(weight=1.0)
The BDeu prior adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.
- Parameters
weight (float) – the prior weight
- Return type
- useDirichletPrior(*args)
Use the Dirichlet prior.
- Parameters
source (str|pyAgrum.BayesNet) – the Dirichlet related source (filename of a database or a Bayesian network)
weight (float (optional)) – the weight of the prior (the ‘size’ of the corresponding ‘virtual database’)
- Return type
- useEM(epsilon)
Indicates if we use EM for parameter learning.
- Parameters
epsilon (float) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.
- Return type
- useGreedyHillClimbing()
Indicate that we wish to use a greedy hill climbing algorithm.
- Return type
- useK2(*args)
Indicate to use the K2 algorithm (which needs a total ordering of the variables).
- Parameters
order (list[int or str]) – sequences of (ids or name)
- Return type
- useLocalSearchWithTabuList(tabu_size=100, nb_decrease=2)
Indicate that we wish to use a local search with tabu list
- Parameters
tabu_size (int) – The size of the tabu list
nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
- Return type
- useMDLCorrection()
Indicate that we wish to use the MDL correction for 3off2 or MIIC
- Return type
- useNMLCorrection()
Indicate that we wish to use the NML correction for 3off2 or MIIC
- Return type
- useNoApriori()
Deprecated methods in BNLearner for pyAgrum>1.1.1
- useNoCorrection()
Indicate that we wish to use the NoCorr correction for 3off2 or MIIC
- Return type
- useScoreLog2Likelihood()
Indicate that we wish to use a Log2Likelihood score.
- Return type
- useSmoothingPrior(weight=1)
Use the prior smoothing.
- Parameters
weight (float) – pass in argument a weight if you wish to assign a weight to the smoothing, otherwise the current weight of the learner will be used.
- Return type
- verbosity()
- Returns
True if the verbosity is enabled
- Return type
bool