Learning¶
pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).
-
class
pyAgrum.BNLearner(filename)¶ - Parameters:
- filename (str) – the file to learn from
- BNLearner(filename,src,parse_database=False) -> BNLearner
- Parameters:
- filename (str) – the file to learn from
- src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
- parse_database (bool) – if True, the modalities specified by the user will be considered as a superset of the modalities of the variables.
- BNLearner(learner) -> BNLearner
- Parameters:
- learner (pyAgrum.BNLearner) – the BNLearner to copy
-
G2(BNLearner self, str var1, str var2, Vector_string knw={})¶ G2 computes the G2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the G2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
addForbiddenArc(BNLearner self, Arc arc)¶ addForbiddenArc(BNLearner self, int tail, int head) addForbiddenArc(BNLearner self, str tail, str head)
The arc in parameters won’t be added.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
addMandatoryArc(BNLearner self, Arc arc)¶ addMandatoryArc(BNLearner self, int tail, int head) addMandatoryArc(BNLearner self, str tail, str head)
Allow to add prior structural knowledge.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
Raises: gum.InvalidDirectedCycle– If the added arc creates a directed cycle in the DAG
-
addPossibleEdge(BNLearner self, Edge edge)¶ addPossibleEdge(BNLearner self, int tail, int head) addPossibleEdge(BNLearner self, str tail, str head)
-
chi2(BNLearner self, str var1, str var2, Vector_string knw={})¶ chi2 computes the chi2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the chi2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
currentTime(BNLearner self)¶ Returns: get the current running time in second (double) Return type: double
-
databaseWeight(BNLearner self)¶
-
epsilon(BNLearner self)¶ Returns: the value of epsilon Return type: double
-
eraseForbiddenArc(BNLearner self, Arc arc)¶ eraseForbiddenArc(BNLearner self, int tail, int head) eraseForbiddenArc(BNLearner self, str tail, str head)
Allow the arc to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
eraseMandatoryArc(BNLearner self, Arc arc)¶ eraseMandatoryArc(BNLearner self, int tail, int head) eraseMandatoryArc(BNLearner self, str tail, str head)
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
erasePossibleEdge(BNLearner self, Edge edge)¶ erasePossibleEdge(BNLearner self, int tail, int head) erasePossibleEdge(BNLearner self, str tail, str head)
Allow the 2 arcs to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
hasMissingValues(BNLearner self)¶ Indicates whether there are missing values in the database.
Returns: True if there are some missing values in the database. Return type: bool
-
history(BNLearner self)¶ Returns: the scheme history Return type: tuple Raises: gum.OperationNotAllowed– If the scheme did not performed or if verbosity is set to false
-
idFromName(BNLearner self, str var_name)¶ Parameters: var_names (str) – a variable’s name Returns: the column id corresponding to a variable name Return type: int Raises: gum.MissingVariableInDatabase– If a variable of the BN is not found in the database.
-
latentVariables(BNLearner self)¶ latentVariables(BNLearner self) -> vector< pyAgrum.Arc,allocator< pyAgrum.Arc > > const
Warning
learner must be using 3off2 or MIIC algorithm
Returns: the list of latent variables Return type: list
-
learnBN(BNLearner self)¶ learn a BayesNet from a file (must have read the db before)
Returns: the learned BayesNet Return type: pyAgrum.BayesNet
-
learnDAG(BNLearner self)¶ learn a structure from a file
Returns: the learned DAG Return type: pyAgrum.DAG
-
learnMixedStructure(BNLearner self)¶ Warning
learner must be using 3off2 or MIIC algorithm
Returns: the learned structure as an EssentialGraph Return type: pyAgrum.EssentialGraph
-
learnParameters(BNLearner self, DAG dag, bool take_into_account_score=True)¶ learnParameters(BNLearner self, bool take_into_account_score=True) -> BayesNet
learns a BN (its parameters) when its structure is known.
Parameters: - dag (pyAgrum.DAG) –
- bn (pyAgrum.BayesNet) –
- take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit apriori (like K2 which has a 1-smoothing apriori), it is important to also take into account this implicit apriori for parameter learning. By default, if a score exists, we will learn parameters by taking into account the apriori specified by methods useAprioriXXX () + the implicit apriori of the score, else we just take into account the apriori specified by useAprioriXXX ()
Returns: the learned BayesNet
Return type: Raises: gum.MissingVariableInDatabase– If a variable of the BN is not found in the databasegum.UnknownLabelInDatabase– If a label is found in the database that do not correspond to the variable
-
logLikelihood(BNLearner self, vector< int, allocator< int > > vars, vector< int, allocator< int > > knowing={})¶ logLikelihood(BNLearner self, vector< int,allocator< int > > vars) -> double logLikelihood(BNLearner self, Vector_string vars, Vector_string knowing={}) -> double logLikelihood(BNLearner self, Vector_string vars) -> double
logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)
Parameters: - vars (List[str]) – the name of the columns of interest
- knowing (List[str]) – the (optional) list of names of conditioning columns
Returns: the log-likelihood (base 2)
Return type: double
-
maxIter(BNLearner self)¶ Returns: the criterion on number of iterations Return type: int
-
maxTime(BNLearner self)¶ Returns: the timeout(in seconds) Return type: double
-
messageApproximationScheme(BNLearner self)¶ Returns: the approximation scheme message Return type: str
-
minEpsilonRate(BNLearner self)¶ Returns: the value of the minimal epsilon rate Return type: double
-
nameFromId(BNLearner self, int id)¶ Parameters: id – a node id Returns: the variable’s name Return type: str
-
names(BNLearner self)¶ Returns: the names of the variables in the database Return type: List[str]
-
nbCols(BNLearner self)¶ Return the nimber of columns in the database
Returns: the number of columns in the database Return type: int
-
nbRows(BNLearner self)¶ Return the number of row in the database
Returns: the number of rows in the database Return type: int
-
nbrIterations(BNLearner self)¶ Returns: the number of iterations Return type: int
-
periodSize(BNLearner self)¶ Returns: the number of samples between 2 stopping Return type: int Raises: gum.OutOfLowerBound– If p<1
-
recordWeight(BNLearner self, size_t i)¶
-
setAprioriWeight(weight)¶ Deprecated methods in BNLearner for pyAgrum>0.14.0
-
setDatabaseWeight(BNLearner self, double new_weight)¶ Set the database weight.
Parameters: weight (double) – the database weight
-
setEpsilon(BNLearner self, double eps)¶ Parameters: eps (double) – the epsilon we want to use Raises: gum.OutOfLowerBound– If eps<0
-
setInitialDAG(BNLearner self, DAG g)¶ Parameters: dag (pyAgrum.DAG) – an initial DAG structure
-
setMaxIndegree(BNLearner self, int max_indegree)¶
-
setMaxIter(BNLearner self, int max)¶ Parameters: max (int) – the maximum number of iteration Raises: gum.OutOfLowerBound– If max <= 1
-
setMaxTime(BNLearner self, double timeout)¶ Parameters: tiemout (double) – stopping criterion on timeout (in seconds) Raises: gum.OutOfLowerBound– If timeout<=0.0
-
setMinEpsilonRate(BNLearner self, double rate)¶ Parameters: rate (double) – the minimal epsilon rate
-
setPeriodSize(BNLearner self, int p)¶ Parameters: p (int) – number of samples between 2 stopping Raises: gum.OutOfLowerBound– If p<1
-
setPossibleSkeleton(BNLearner self, UndiGraph skeleton)¶
-
setRecordWeight(BNLearner self, size_t i, double weight)¶
-
setSliceOrder(BNLearner self, PyObject * l)¶ setSliceOrder(BNLearner self, pyAgrum.NodeProperty< int > slice_order) setSliceOrder(BNLearner self, vector< vector< str,allocator< str > >,allocator< vector< str,allocator< str > > > > slices)
Set a partial order on the nodes.
Parameters: l (list) – a list of sequences (composed of ids of rows or string)
-
setVerbosity(BNLearner self, bool v)¶ Parameters: v (bool) – verbosity
-
use3off2(BNLearner self)¶ Indicate that we wish to use 3off2.
-
useAprioriBDeu(BNLearner self, double weight=1)¶ useAprioriBDeu(BNLearner self)
The BDeu apriori adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.
Parameters: weight (double) – the apriori weight
-
useAprioriDirichlet(BNLearner self, str filename, double weight=1)¶ useAprioriDirichlet(BNLearner self, str filename)
-
useAprioriSmoothing(BNLearner self, double weight=1)¶ useAprioriSmoothing(BNLearner self)
-
useEM(BNLearner self, double epsilon)¶ Indicates if we use EM for parameter learning.
Parameters: epsilon (double) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.
-
useGreedyHillClimbing(BNLearner self)¶
-
useK2(BNLearner self, PyObject * l)¶ useK2(BNLearner self, pyAgrum.Sequence< int > order) useK2(BNLearner self, vector< int,allocator< int > > order)
Indicate that we wish to use K2.
Parameters: order (list) – a list of ids
-
useLocalSearchWithTabuList(BNLearner self, int tabu_size=100, int nb_decrease=2)¶ useLocalSearchWithTabuList(BNLearner self, int tabu_size=100) useLocalSearchWithTabuList(BNLearner self)
Indicate that we wish to use a local search with tabu list
Parameters: - tabu_size (int) – The size of the tabu list
- nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
-
useMDL(BNLearner self)¶ Indicate that we wish to use the MDL correction for 3off2 or MIIC
-
useMIIC(BNLearner self)¶ Indicate that we wish to use MIIC.
-
useNML(BNLearner self)¶ Indicate that we wish to use the NML correction for 3off2 or MIIC
-
useNoApriori(BNLearner self)¶
-
useNoCorr(BNLearner self)¶ Indicate that we wish to use the NoCorr correction for 3off2 or MIIC
-
useScoreAIC(BNLearner self)¶
-
useScoreBD(BNLearner self)¶
-
useScoreBDeu(BNLearner self)¶
-
useScoreBIC(BNLearner self)¶
-
useScoreK2(BNLearner self)¶
-
useScoreLog2Likelihood(BNLearner self)¶
-
verbosity(BNLearner self)¶ Returns: True if the verbosity is enabled Return type: bool