Learning¶
pyAgrum encloses all the learning processes for Bayesian network in a simple class BNLearner. This class gives access directly to the complete learning algorithm and theirs parameters (such as prior, scores, constraints, etc.) but also proposes low-level functions that eases the work on developping new learning algorithms (for instance, compute chi2 or conditioanl likelihood on the database, etc.).
-
class
pyAgrum.
BNLearner
(filename)¶ - Parameters:
- filename (str) – the file to learn from
- BNLearner(filename,src,parse_database=False) -> BNLearner
- Parameters:
- filename (str) – the file to learn from
- src (pyAgrum.BayesNet) – the Bayesian network used to find those modalities
- parse_database (bool) – if True, the modalities specified by the user will be considered as a superset of the modalities of the variables.
- BNLearner(learner) -> BNLearner
- Parameters:
- learner (pyAgrum.BNLearner) – the BNLearner to copy
-
G2
(BNLearner self, str var1, str var2, Vector_string knw={})¶ G2 computes the G2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the G2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
addForbiddenArc
(BNLearner self, Arc arc)¶ addForbiddenArc(BNLearner self, int tail, int head) addForbiddenArc(BNLearner self, str tail, str head)
The arc in parameters won’t be added.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
addMandatoryArc
(BNLearner self, Arc arc)¶ addMandatoryArc(BNLearner self, int tail, int head) addMandatoryArc(BNLearner self, str tail, str head)
Allow to add prior structural knowledge.
Parameters: - arc (pyAgrum.Arc) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
Raises: gum.InvalidDirectedCycle
– If the added arc creates a directed cycle in the DAG
-
addPossibleEdge
(BNLearner self, Edge edge)¶ addPossibleEdge(BNLearner self, int tail, int head) addPossibleEdge(BNLearner self, str tail, str head)
-
chi2
(BNLearner self, str var1, str var2, Vector_string knw={})¶ chi2 computes the chi2 statistic and pvalue for two columns, given a list of other columns.
Parameters: - name1 (str) – the name of the first column
- name2 (str) – the name of the second column
- knowing ([str]) – the list of names of conditioning columns
Returns: the chi2 statistic and the associated p-value as a Tuple
Return type: statistic,pvalue
-
currentTime
(BNLearner self)¶ Returns: get the current running time in second (double) Return type: double
-
databaseWeight
(BNLearner self)¶
-
epsilon
(BNLearner self)¶ Returns: the value of epsilon Return type: double
-
eraseForbiddenArc
(BNLearner self, Arc arc)¶ eraseForbiddenArc(BNLearner self, int tail, int head) eraseForbiddenArc(BNLearner self, str tail, str head)
Allow the arc to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
eraseMandatoryArc
(BNLearner self, Arc arc)¶ eraseMandatoryArc(BNLearner self, int tail, int head) eraseMandatoryArc(BNLearner self, str tail, str head)
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
erasePossibleEdge
(BNLearner self, Edge edge)¶ erasePossibleEdge(BNLearner self, int tail, int head) erasePossibleEdge(BNLearner self, str tail, str head)
Allow the 2 arcs to be added if necessary.
Parameters: - arc (pyAgrum) – an arc
- head – a variable’s id (int)
- tail – a variable’s id (int)
- head – a variable’s name (str)
- tail – a variable’s name (str)
-
hasMissingValues
(BNLearner self)¶ Indicates whether there are missing values in the database.
Returns: True if there are some missing values in the database. Return type: bool
-
history
(BNLearner self)¶ Returns: the scheme history Return type: tuple Raises: gum.OperationNotAllowed
– If the scheme did not performed or if verbosity is set to false
-
idFromName
(BNLearner self, str var_name)¶ Parameters: var_names (str) – a variable’s name Returns: the column id corresponding to a variable name Return type: int Raises: gum.MissingVariableInDatabase
– If a variable of the BN is not found in the database.
-
latentVariables
(BNLearner self)¶ latentVariables(BNLearner self) -> vector< pyAgrum.Arc,allocator< pyAgrum.Arc > > const
Warning
learner must be using 3off2 or MIIC algorithm
Returns: the list of latent variables Return type: list
-
learnBN
(BNLearner self)¶ learn a BayesNet from a file (must have read the db before)
Returns: the learned BayesNet Return type: pyAgrum.BayesNet
-
learnDAG
(BNLearner self)¶ learn a structure from a file
Returns: the learned DAG Return type: pyAgrum.DAG
-
learnMixedStructure
(BNLearner self)¶ Warning
learner must be using 3off2 or MIIC algorithm
Returns: the learned structure as an EssentialGraph Return type: pyAgrum.EssentialGraph
-
learnParameters
(BNLearner self, DAG dag, bool take_into_account_score=True)¶ learnParameters(BNLearner self, bool take_into_account_score=True) -> BayesNet
learns a BN (its parameters) when its structure is known.
Parameters: - dag (pyAgrum.DAG) –
- bn (pyAgrum.BayesNet) –
- take_into_account_score (bool) – The dag passed in argument may have been learnt from a structure learning. In this case, if the score used to learn the structure has an implicit apriori (like K2 which has a 1-smoothing apriori), it is important to also take into account this implicit apriori for parameter learning. By default, if a score exists, we will learn parameters by taking into account the apriori specified by methods useAprioriXXX () + the implicit apriori of the score, else we just take into account the apriori specified by useAprioriXXX ()
Returns: the learned BayesNet
Return type: Raises: gum.MissingVariableInDatabase
– If a variable of the BN is not found in the databasegum.UnknownLabelInDatabase
– If a label is found in the database that do not correspond to the variable
-
logLikelihood
(BNLearner self, vector< int, allocator< int > > vars, vector< int, allocator< int > > knowing={})¶ logLikelihood(BNLearner self, vector< int,allocator< int > > vars) -> double logLikelihood(BNLearner self, Vector_string vars, Vector_string knowing={}) -> double logLikelihood(BNLearner self, Vector_string vars) -> double
logLikelihood computes the log-likelihood for the columns in vars, given the columns in the list knowing (optional)
Parameters: - vars (List[str]) – the name of the columns of interest
- knowing (List[str]) – the (optional) list of names of conditioning columns
Returns: the log-likelihood (base 2)
Return type: double
-
maxIter
(BNLearner self)¶ Returns: the criterion on number of iterations Return type: int
-
maxTime
(BNLearner self)¶ Returns: the timeout(in seconds) Return type: double
-
messageApproximationScheme
(BNLearner self)¶ Returns: the approximation scheme message Return type: str
-
minEpsilonRate
(BNLearner self)¶ Returns: the value of the minimal epsilon rate Return type: double
-
nameFromId
(BNLearner self, int id)¶ Parameters: id – a node id Returns: the variable’s name Return type: str
-
names
(BNLearner self)¶ Returns: the names of the variables in the database Return type: List[str]
-
nbCols
(BNLearner self)¶ Return the nimber of columns in the database
Returns: the number of columns in the database Return type: int
-
nbRows
(BNLearner self)¶ Return the number of row in the database
Returns: the number of rows in the database Return type: int
-
nbrIterations
(BNLearner self)¶ Returns: the number of iterations Return type: int
-
periodSize
(BNLearner self)¶ Returns: the number of samples between 2 stopping Return type: int Raises: gum.OutOfLowerBound
– If p<1
-
recordWeight
(BNLearner self, size_t i)¶
-
setAprioriWeight
(weight)¶ Deprecated methods in BNLearner for pyAgrum>0.14.0
-
setDatabaseWeight
(BNLearner self, double new_weight)¶ Set the database weight.
Parameters: weight (double) – the database weight
-
setEpsilon
(BNLearner self, double eps)¶ Parameters: eps (double) – the epsilon we want to use Raises: gum.OutOfLowerBound
– If eps<0
-
setInitialDAG
(BNLearner self, DAG g)¶ Parameters: dag (pyAgrum.DAG) – an initial DAG structure
-
setMaxIndegree
(BNLearner self, int max_indegree)¶
-
setMaxIter
(BNLearner self, int max)¶ Parameters: max (int) – the maximum number of iteration Raises: gum.OutOfLowerBound
– If max <= 1
-
setMaxTime
(BNLearner self, double timeout)¶ Parameters: tiemout (double) – stopping criterion on timeout (in seconds) Raises: gum.OutOfLowerBound
– If timeout<=0.0
-
setMinEpsilonRate
(BNLearner self, double rate)¶ Parameters: rate (double) – the minimal epsilon rate
-
setPeriodSize
(BNLearner self, int p)¶ Parameters: p (int) – number of samples between 2 stopping Raises: gum.OutOfLowerBound
– If p<1
-
setPossibleSkeleton
(BNLearner self, UndiGraph skeleton)¶
-
setRecordWeight
(BNLearner self, size_t i, double weight)¶
-
setSliceOrder
(BNLearner self, PyObject * l)¶ setSliceOrder(BNLearner self, pyAgrum.NodeProperty< int > slice_order) setSliceOrder(BNLearner self, vector< vector< str,allocator< str > >,allocator< vector< str,allocator< str > > > > slices)
Set a partial order on the nodes.
Parameters: l (list) – a list of sequences (composed of ids of rows or string)
-
setVerbosity
(BNLearner self, bool v)¶ Parameters: v (bool) – verbosity
-
use3off2
(BNLearner self)¶ Indicate that we wish to use 3off2.
-
useAprioriBDeu
(BNLearner self, double weight=1)¶ useAprioriBDeu(BNLearner self)
The BDeu apriori adds weight to all the cells of the counting tables. In other words, it adds weight rows in the database with equally probable values.
Parameters: weight (double) – the apriori weight
-
useAprioriDirichlet
(BNLearner self, str filename, double weight=1)¶ useAprioriDirichlet(BNLearner self, str filename)
-
useAprioriSmoothing
(BNLearner self, double weight=1)¶ useAprioriSmoothing(BNLearner self)
-
useEM
(BNLearner self, double epsilon)¶ Indicates if we use EM for parameter learning.
Parameters: epsilon (double) – if epsilon=0.0 then EM is not used if epsilon>0 then EM is used and stops when the sum of the cumulative squared error on parameters is les than epsilon.
-
useGreedyHillClimbing
(BNLearner self)¶
-
useK2
(BNLearner self, PyObject * l)¶ useK2(BNLearner self, pyAgrum.Sequence< int > order) useK2(BNLearner self, vector< int,allocator< int > > order)
Indicate that we wish to use K2.
Parameters: order (list) – a list of ids
-
useLocalSearchWithTabuList
(BNLearner self, int tabu_size=100, int nb_decrease=2)¶ useLocalSearchWithTabuList(BNLearner self, int tabu_size=100) useLocalSearchWithTabuList(BNLearner self)
Indicate that we wish to use a local search with tabu list
Parameters: - tabu_size (int) – The size of the tabu list
- nb_decrease (int) – The max number of changes decreasing the score consecutively that we allow to apply
-
useMDL
(BNLearner self)¶ Indicate that we wish to use the MDL correction for 3off2 or MIIC
-
useMIIC
(BNLearner self)¶ Indicate that we wish to use MIIC.
-
useNML
(BNLearner self)¶ Indicate that we wish to use the NML correction for 3off2 or MIIC
-
useNoApriori
(BNLearner self)¶
-
useNoCorr
(BNLearner self)¶ Indicate that we wish to use the NoCorr correction for 3off2 or MIIC
-
useScoreAIC
(BNLearner self)¶
-
useScoreBD
(BNLearner self)¶
-
useScoreBDeu
(BNLearner self)¶
-
useScoreBIC
(BNLearner self)¶
-
useScoreK2
(BNLearner self)¶
-
useScoreLog2Likelihood
(BNLearner self)¶
-
verbosity
(BNLearner self)¶ Returns: True if the verbosity is enabled Return type: bool