Learning a CTBN

One of the main features of this library is the possibility to learn a CTBN.

More precisely what can be learned is :
  • The dependency graph of a CTBN

  • The CIMs of a CTBN

  • (The variables and their labels from a sample)

Tools to extract data from samples are necessary. This is the role of class pyAgrum.ctbn.Trajectory and function pyAgrum.ctbn.CTBNFromData().

Before introducing the algorithms, here are the following definitions :
  • \(M_{xx'|u}\) is the number of time a variable X go from a state x to a state x’, conditioned by an instance of its parents u. It is filled using samples.

  • \(M_{x|u}\) is the number of time X goes to state x.

  • \(T_{x|u}\) is the time spent in state x, conditioned by an instance of its parents u.

  • \(M_{xx'|y,u}\) and \(T_{x|y,u}\) are the same but with another conditioning variable Y in state y.

Those can be stored in pyAgrum.Potential.

Being conditioned by an instance means that the extracted data comes from time intervals where conditioning variables take specific values.

Learning parameters : learning the CIMs

Goal : finding the \(q_{i,j|u}\) (i.e \(q_{x|u}\) and \(q_{x \rightarrow x'|u}\)) coefficients.

Idea : \(q_{x|u}\) = \(\frac{M_{x|u}}{T_{x|u}}\); \(P_X(x\rightarrow x') = \frac{M_{x \rightarrow x'|u}}{M_{x|u}} = \frac{q_{x \rightarrow x'|u}}{q_{x|u}}\) Then \(q_{x \rightarrow x'|u} = \frac{M_{x \rightarrow x'|u}}{T_{x|u}}\)

Learning the graph

To learn the graph of a CTBN (ie the dependence between variables) we use the CTPC algorithm from A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020. The independence test used is based on Fisher and chi2 tests to compare exponential distributions.

class pyAgrum.ctbn.Learner(source)

Class used to learn a CTBN (independence between variables and CIMs) using samples.

Parameters:

source (str|Dict[int, List[Tuple[float, str, str]]]) – Path to the csv file containing the samples(trajectories). Or directly the trajectories in a python dict.

fitParameters(ctbn)

Learns the parameters of ctbn’s CIMs.

Parameters:

ctbn (CTBN) – CTBN containing the CIMs to learn.

learnCTBN(template=None)

Learns a CTBN, using the CTPC(continuous-time PC) algorithm. Reference : A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020.

Parameters:

template (CTBN) – CTBN used to find variables. If not given, variables are searched inside the trajectories. (if the trajectory is very short, some variables can be missed).

Returns:

The learned ctbn.

Return type:

CTBN

pyAgrum.ctbn.readTrajectoryCSV(filename)

Reads trajectories from a csv file. Storing format : {IdSample, time, var, state}

Parameters:

filename (str) – Path to the file.

Returns:

The trajectories, a trajectory for every index.

Return type:

Dict[int, List[Tuple[float, str, str]]]

pyAgrum.ctbn.CTBNFromData(data)

Constructs a CTBN and add the corresponding variables found in the trajectories.

Warning

If data is too short, some variables or state labels might be missed.

Parameters:

data (Dict[int, List[Tuple[float, str, str]]]) – The trajectories used to look for variables.

Returns:

The resulting CTBN.

Return type:

CTBN

pyAgrum.ctbn.computeCIMFromStats(X, M, T)

Computes a CIM (Conditional Intensity Matrix) using stats from a trajectory. Variables in the potential are not copied but directly used in the result to avoid memory issues.

Parameters:
  • X (str) – Name of the variable to compute CIM for.

  • M (pyAgrum.Potential) – Potential containing the number of transitions for each pair of X’s states.

  • T (pyAgrum.Potential) – Potential containing the time spent to transition from every state of X.

Returns:

The resulting potential, X’s CIM.

Return type:

pyAgrum.Potential

class pyAgrum.ctbn.Trajectory(source, ctbn=None)

Tools to extract useful informations from a trajectory. It is used for parameters/graph learning. It can be created from a trajectory (a dict of trajectories) or from a file that contains one.

Parameters:
  • source (str|Dict[int, List[Tuple[float, str, str]]]) – The path to a csv file containing the samples or the dict of trajectories itself.

  • ctbn (CTBN) – To link the variables’s name in the trajectory to their pyAgrum variable. If not given, a new CTBN is created with the variables and labels found in the trajectory. (warning : if the trajectory is short, all of the variables may not be found correctly).

data

The samples.

Type:

Dict[int, List[Tuple[float, str, str]]]

ctbn

The CTBN used to link the names in the trajectory to pyAgrum variables.

Type:

CTBN

timeHorizon

The time length of the trajectory.

Type:

float

computeAllCIMs()

Computes the CIMs of the variables in self.ctbn. Conditioning is given by the graph of self.ctbn.

computeStats(X, U)

Computes time spent and number of transitions values of X and returns them as pyAgrum.Potential.

Parameters:
  • X (str) – Name of the variable.

  • U (List[str]) – List of conditioning variable’s name.

Returns:

The resulting potentials.

Return type:

Tuple[pyAgrum.Potential, pyAgrum.Potential]

computeStatsForTests(X, Y, U)

Computes time spent and number of transitions values of X when conditioned by Y and U and returns them as pyAgrum.Potential. Used for independence testing.

Parameters:
  • X (str) – Name of the variable.

  • Y (str) – Name of a conditioning variable not in U.

  • U (List[str]) – List of conditioning variable’s name.

Returns:

The resulting potentials.

Return type:

Tuple[pyAgrum.Potential, pyAgrum.Potential, pyAgrum.Potential]

setStatValues(X, inst_u, Txu, Mxu)

Fills the potentials given.

Parameters:
  • X (str) – Name of the variable.

  • inst_u (Dict[str, str]) – Instance of conditioning variables.

  • Txu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state.

  • Mxu (pyAgrum.Potential) – Potential to fill. Contains the number of transitions from any pair of states.

setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)

Fills the potentials given. They are used for independence testing.

Parameters:
  • X (str) – Name of the variable.

  • Y (str) – Name of a conditioning variable.

  • inst_u (Dict[str, str]) – Instance of conditioning variables.

  • Txu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state. Conditioned by variables in inst_u.

  • Txyu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state. Conditioned by Y and variables in inst_u.

  • Mxyu (pyAgrum.Potential) – Potential to fill. Contains the number of transitions from any pair of states. Conditioned by Y and variables in inst_u.

class pyAgrum.ctbn.Stats(trajectory, X, Y, par)

Stores all potentials used for learning.

Parameters:
  • trajectory (Trajectory) – Samples used to find stats.

  • X (str) – Name of the variable to study.

  • Y (str) – Name of the variable used for conditioning variable X.

  • par (List[str]) – List of conditioning variables of X.

Mxy

Potential containing the number of transitions the variable X does from any of its states for any instance of its parents and variable``Y``.

Type:

pyAgrum.Potential

Mx

Potential containing the number of transitions the variable X does from any of its states for any instance of its parents.

Type:

pyAgrum.Potential

Tx

Potential containing the time spent by X to transition from a state to another for any instance of its parents.

Type:

pyAgrum.Potential

Txy

Potential containing the time spent by X to transition from a state to another for any instance of its parents and of Y.

Type:

pyAgrum.Potential

Qx

Conditional Intensity Matrix(CIM) of X.

Type:

pyAgrum.Potential

QxY

Conditional Intensity Matrix(CIM) of X that includes the conditioning variable Y.

Type:

pyAgrum.Potential

class pyAgrum.ctbn.StatsIndepTest.FChi2Test(tr)

Bases: IndepTest

This class use 2 independence tests : Fisher Test (F-test) and chi2 Test. To test independence between 2 variables, we first consider them independent. There is independence until one of the 2 tests (F and chi2) contradict the independence hypothesis. If the hyopothesis is not rejected, the variables are considered independent.

Parameters:

tr (Trajectory) – Samples used to extract stats.

addVariables(X, Y, U)

Saves variables X and Y and the conditioning set U, and generates stats to be used in statistical tests.

Parameters:
  • X (str) – Name of the variable.

  • Y (str) – Name of the variable to test independence from, not in U.

  • U (List[str]) – List of conditioning variables.

computeChi2()

Compute chi2-test value for every instance of the variables.

Returns:

chi2-test value.

Return type:

pyAgrum.Potential

computeF()

Compute F-test value for every instance of the variables.

Returns:

F-test value.

Return type:

pyAgrum.Potential

getMxxGivenU(M, Y)
Parameters:
  • M (pyAgrum.Potential) – A matrix M_{x, x’ | y, U}, for some instantiation U of the conditioning set and y of a specific parent.

  • Y (str) – A parent.

Returns:

The potential M_{x, x’ | U} by summing over all values of y.

Return type:

pyAgrum.Potential

nullStateToStateTransitionHypothesisChi2(X, Y, _)

Decides if the null state to state transition hypothesis is rejected using chi2-test.

Parameters:
  • X (str) – A random variable.

  • Y (str) – A parent of X.

  • _ (List[str]) – A subset of the parents of X that does not contain Y.

  • _

Returns:

False if X is not independent of Y given the conditioning set U.

Return type:

bool

nullTimeToTransitionHypothesisF(X, Y, _)

Decides if the null time to transition hypothesis is rejected using F-test.

Parameters:
  • X (str) – A random variable.

  • Y (str) – A parent of X.

  • _ (List[str]) – A subset of the parents of X that does not contain Y.

  • _

Returns:

False if X is not independent of Y given the conditioning set U.

Return type:

bool

testIndep(X, Y, U)
Parameters:
  • X (str) – Name of the variable.

  • Y (str) – Name of the variable to test independence from, not in U.

  • U (List[str]) – List of conditioning variables.

Returns:

true if X is independent to Y given U, otherwise false.

Return type:

bool

class pyAgrum.ctbn.StatsIndepTest.IndepTest

Bases: object

Mother class used to test independance between 2 variables knowing some other parents.

abstract testIndep(X, Y, U)
Parameters:
  • X (str) – Head of the arc we want to test.

  • Y (str) – Tail of the arc we want to test.

  • U (List[str]) – Known parents.

Return type:

bool

class pyAgrum.ctbn.StatsIndepTest.Oracle(ctbn)

Bases: IndepTest

Oracle’s testing tools.

Parameters:

ctbn (CTBN)

testIndep(X, Y, U)
Parameters:
  • X (str) – Head of the arc we want to test.

  • Y (str) – Tail of the arc we want to test.

  • U (List[str]) – Known parents.

Returns:

False if there is an arc from Y to X knowing U, True otherwise.

Return type:

bool

pyAgrum.ctbn.StatsIndepTest.sqrtPotential(potential)

Applies sqrt function to all values inside the potential.

Parameters:

potential (pyAgrum.Potential) – potential to play sqrt to.

Returns:

sqrt of potential.

Return type:

pyAgrum.Potential