Learning¶
pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).
-
class
pyAgrum.
BNLearner
(filename, inducedTypes=True)¶ - Parameters:
- filename (str) – the file to learn from
- inducedTypes (Bool) – whether BNLearner should try to automatically find the type of each variable
- BNLearner(filename,src) -> BNLearner
- Parameters:
- filename (str) – the file to learn from
- src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
- BNLearner(learner) -> BNLearner
- Parameters:
- learner (pyAgrum.BNLearner) – the BNLearner to copy
-
G2
(self, var1, var2, knw={})¶ G2 computes the G2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the G2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
addForbiddenArc
(self, arc)¶ addForbiddenArc(self, tail, head) addForbiddenArc(self, tail, head)
The arc in parameters won’t be added.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
addMandatoryArc
(self, arc)¶ addMandatoryArc(self, tail, head) addMandatoryArc(self, tail, head)
Allow to add prior structural knowledge.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
Raises: gum.InvalidDirectedCycle
– If the added arc creates a directed cycle in the DAG
-
addPossibleEdge
(self, edge)¶ addPossibleEdge(self, tail, head) addPossibleEdge(self, tail, head)
-
chi2
(self, var1, var2, knw={})¶ chi2 computes the chi2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the chi2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
currentTime
(self)¶ Returns: get the current running time in second (double) Return type: double
-
databaseWeight
(self)¶
-
domainSize
(self, var)¶ domainSize(self, var) -> int
-
epsilon
(self)¶ Returns: the value of epsilon Return type: double
-
eraseForbiddenArc
(self, arc)¶ eraseForbiddenArc(self, tail, head) eraseForbiddenArc(self, tail, head)
Allow the arc to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
eraseMandatoryArc
(self, arc)¶ eraseMandatoryArc(self, tail, head) eraseMandatoryArc(self, tail, head)
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
erasePossibleEdge
(self, edge)¶ erasePossibleEdge(self, tail, head) erasePossibleEdge(self, tail, head)
Allow the 2 arcs to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
hasMissingValues
(self)¶ Indicates whether there are missing values in the database.
Returns: True if there are some missing values in the database. Return type: bool
-
history
(self)¶ Returns: the scheme history Return type: tuple Raises: gum.OperationNotAllowed
– If the scheme did not performed or if verbosity is set to false
-
idFromName
(self, var_name)¶ Parameters: var_names (str) – a variable’s name Returns: the column id corresponding to a variable name Return type: int Raises: gum.MissingVariableInDatabase
– If a variable of the BN is not found in the database.
-
latentVariables
(self)¶ latentVariables(self) -> vector< pyAgrum.Arc,allocator< pyAgrum.Arc > > const
Warning
learner must be using 3off2 or MIIC algorithm
Returns: the list of latent variables Return type: list
-
learnBN
(self)¶ learn a BayesNet from a file (must have read the db before)
Returns: the learned BayesNet Return type: pyAgrum.BayesNet
-
learnDAG
(self)¶ learn a structure from a file
Returns: the learned DAG Return type: pyAgrum.DAG
-
learnMixedStructure
(self)¶ Warning
learner must be using 3off2 or MIIC algorithm
Returns: the learned structure as an EssentialGraph Return type: pyAgrum.EssentialGraph
-
learnParameters
(self, dag, takeIntoAccountScore=True)¶ learnParameters(self, take_into_account_score=True) -> BayesNet
learns a BN (its parameters) when its structure is known.
Parameters: - dag (pyAgrum.DAG) –
- bn (pyAgrum.BayesNet) –
- take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit apriori (like K2 which has a 1-smoothing apriori), it is important to also take into account this implicit apriori for parameter learning. By default, if a score exists, we will learn parameters by taking into account the apriori specified by methods useAprioriXXX () + the implicit apriori of the score, else we just take into account the apriori specified by useAprioriXXX ()
Returns: the learned BayesNet
Return type: Raises: gum.MissingVariableInDatabase
– If a variable of the BN is not found in the databasegum.UnknownLabelInDatabase
– If a label is found in the database that do not correspond to the variable
-
logLikelihood
(self, vars, knowing={})¶ logLikelihood(self, vars) -> double logLikelihood(self, vars, knowing={}) -> double logLikelihood(self, vars) -> double
logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)
Parameters: - vars (List[str]) – the name of the columns of interest
- knowing (List[str]) – the (optional) list of names of conditioning columns
Returns: the log-likelihood (base 2)
Return type: double
-
maxIter
(self)¶ Returns: the criterion on number of iterations Return type: int
-
maxTime
(self)¶ Returns: the timeout(in seconds) Return type: double
-
messageApproximationScheme
(self)¶ Returns: the approximation scheme message Return type: str
-
minEpsilonRate
(self)¶ Returns: the value of the minimal epsilon rate Return type: double
-
nameFromId
(self, id)¶ Parameters: id – a node id Returns: the variable’s name Return type: str
-
names
(self)¶ Returns: the names of the variables in the database Return type: List[str]
-
nbCols
(self)¶ Return the nimber of columns in the database
Returns: the number of columns in the database Return type: int
-
nbRows
(self)¶ Return the number of row in the database
Returns: the number of rows in the database Return type: int
-
nbrIterations
(self)¶ Returns: the number of iterations Return type: int
-
periodSize
(self)¶ Returns: the number of samples between 2 stopping Return type: int Raises: gum.OutOfBounds
– If p<1
-
pseudoCount
(vars)¶ access to pseudo-count (priors taken into account)
Parameters: vars (list[str]) – a list of name of vars to add in the pseudo_count Returns: Return type: a Potential containing this pseudo-counts
-
rawPseudoCount
(self, vars)¶ rawPseudoCount(self, vars) -> Vector
-
recordWeight
(self, i)¶
-
setAprioriWeight
(weight)¶ Deprecated methods in BNLearner for pyAgrum>0.14.0
-
setDatabaseWeight
(self, new_weight)¶ Set the database weight which is given as an equivalent sample size.
Parameters: weight (double) – the database weight
-
setEpsilon
(self, eps)¶ Parameters: eps (double) – the epsilon we want to use Raises: gum.OutOfBounds
– If eps<0
-
setInitialDAG
(self, g)¶ Parameters: dag (pyAgrum.DAG) – an initial DAG structure
-
setMaxIndegree
(self, max_indegree)¶
-
setMaxIter
(self, max)¶ Parameters: max (int) – the maximum number of iteration Raises: gum.OutOfBounds
– If max <= 1
-
setMaxTime
(self, timeout)¶ Parameters: tiemout (double) – stopping criterion on timeout (in seconds) Raises: gum.OutOfBounds
– If timeout<=0.0
-
setMinEpsilonRate
(self, rate)¶ Parameters: rate (double) – the minimal epsilon rate
-
setPeriodSize
(self, p)¶ Parameters: p (int) – number of samples between 2 stopping Raises: gum.OutOfBounds
– If p<1
-
setPossibleSkeleton
(self, skeleton)¶
-
setRecordWeight
(self, i, weight)¶
-
setSliceOrder
(self, l)¶ setSliceOrder(self, slice_order) setSliceOrder(self, slices)
Set a partial order on the nodes.
Parameters: l (list) – a list of sequences (composed of ids of rows or string)
-
setVerbosity
(self, v)¶ Parameters: v (bool) – verbosity
-
state
(self)¶
-
use3off2
(self)¶ Indicate that we wish to use 3off2.
-
useAprioriBDeu
(self, weight=1)¶ useAprioriBDeu(self)
The BDeu apriori adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.
Parameters: weight (double) – the apriori weight
-
useAprioriDirichlet
(self, filename, weight=1)¶ useAprioriDirichlet(self, filename)
-
useAprioriSmoothing
(self, weight=1)¶ useAprioriSmoothing(self)
-
useEM
(self, epsilon)¶ Indicates if we use EM for parameter learning.
Parameters: epsilon (double) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.
-
useGreedyHillClimbing
(self)¶
-
useK2
(self, l)¶ useK2(self, order) useK2(self, order)
Indicate to use the K2 algorithm (which needs a total ordering of the variables).
Parameters: order (list[int or str]) – sequences of (ids or name)
-
useLocalSearchWithTabuList
(self, tabu_size=100, nb_decrease=2)¶ useLocalSearchWithTabuList(self, tabu_size=100) useLocalSearchWithTabuList(self)
Indicate that we wish to use a local search with tabu list
Parameters: - tabu_size (int) – The size of the tabu list
- nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
-
useMDLCorrection
(self)¶ Indicate that we wish to use the MDL correction for 3off2 or MIIC
-
useMIIC
(self)¶ Indicate that we wish to use MIIC.
-
useNMLCorrection
(self)¶ Indicate that we wish to use the NML correction for 3off2 or MIIC
-
useNoApriori
(self)¶
-
useNoCorrection
(self)¶ Indicate that we wish to use the NoCorr correction for 3off2 or MIIC
-
useScoreAIC
(self)¶
-
useScoreBD
(self)¶
-
useScoreBDeu
(self)¶
-
useScoreBIC
(self)¶
-
useScoreK2
(self)¶
-
useScoreLog2Likelihood
(self)¶
-
verbosity
(self)¶ Returns: True if the verbosity is enabled Return type: bool