Learning a CTBN

One of the main features of this library is the possibility to learn a CTBN.

More precisely what can be learned is :

The dependency graph of a CTBN
The CIMs of a CTBN
(The variables and their labels from a sample)

Tools to extract data from samples are necessary. This is the role of class pyagrum.ctbn.Trajectory and function pyagrum.ctbn.CTBNFromData().

Before introducing the algorithms, here are the following definitions :

\(M_{xx'|u}\) is the number of time a variable X go from a state x to a state x’, conditioned by an instance of its parents u. It is filled using samples.
\(M_{x|u}\) is the number of time X goes to state x.
\(T_{x|u}\) is the time spent in state x, conditioned by an instance of its parents u.
\(M_{xx'|y,u}\) and \(T_{x|y,u}\) are the same but with another conditioning variable Y in state y.

Those can be stored in pyagrum.Tensor.

Being conditioned by an instance means that the extracted data comes from time intervals where conditioning variables take specific values.

Learning parameters : learning the CIMs

Goal : finding the \(q_{i,j|u}\) (i.e \(q_{x|u}\) and \(q_{x \rightarrow x'|u}\)) coefficients.

Idea : \(q_{x|u}\) = \(\frac{M_{x|u}}{T_{x|u}}\); \(P_X(x\rightarrow x') = \frac{M_{x \rightarrow x'|u}}{M_{x|u}} = \frac{q_{x \rightarrow x'|u}}{q_{x|u}}\) Then \(q_{x \rightarrow x'|u} = \frac{M_{x \rightarrow x'|u}}{T_{x|u}}\)

Learning the graph

To learn the graph of a CTBN (ie the dependence between variables) we use the CTPC algorithm from Bregoli et al. [BSS20] (and using Nodelman et al. [NSK02]). The independence test used is based on Fisher and chi2 tests to compare exponential distributions.

class pyagrum.ctbn.Learner(source)

Class used to learn a CTBN (independence between variables and CIMs) using samples.

Parameters:: source (str|Dict[int, List[Tuple[float, str, str]]]) – Path to the csv file containing the samples(trajectories). Or directly the trajectories in a python dict.

fitParameters(ctbn)

Learns the parameters of ctbn’s CIMs.

Parameters:: ctbn (CTBN) – CTBN containing the CIMs to learn.

learnCTBN(template=None)

Learns a CTBN, using the CTPC(continuous-time PC) algorithm. Reference : A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020.

Parameters:: template (CTBN) – CTBN used to find variables. If not given, variables are searched inside the trajectories. (if the trajectory is very short, some variables can be missed).
Returns:: The learned ctbn.
Return type:: CTBN

pyagrum.ctbn.readTrajectoryCSV(filename)

Reads trajectories from a csv file. Storing format : {IdSample, time, var, state}

Parameters:: filename (str) – Path to the file.
Returns:: The trajectories, a trajectory for every index.
Return type:: Dict[int, List[Tuple[float, str, str]]]

pyagrum.ctbn.CTBNFromData(data)

Constructs a CTBN and add the corresponding variables found in the trajectories.

Warning

If data is too short, some variables or state labels might be missed.

Parameters:: data (Dict[int, List[Tuple[float, str, str]]]) – The trajectories used to look for variables.
Returns:: The resulting CTBN.
Return type:: CTBN

pyagrum.ctbn.computeCIMFromStats(X, M, T)

Computes a CIM (Conditional Intensity Matrix) using stats from a trajectory. Variables in the tensor are not copied but directly used in the result to avoid memory issues.

Parameters:

X (str) – Name of the variable to compute CIM for.
M (pyagrum.Tensor) – Tensor containing the number of transitions for each pair of X’s states.
T (pyagrum.Tensor) – Tensor containing the time spent to transition from every state of X.

Returns:

The resulting tensor, X’s CIM.

Return type:

pyagrum.Tensor

class pyagrum.ctbn.Trajectory(source, ctbn=None)

Tools to extract useful informations from a trajectory. It is used for parameters/graph learning. It can be created from a trajectory (a dict of trajectories) or from a file that contains one.

Parameters:

source (str|Dict[int, List[Tuple[float, str, str]]]) – The path to a csv file containing the samples or the dict of trajectories itself.
ctbn (CTBN) – To link the variables’s name in the trajectory to their pyAgrum variable. If not given, a new CTBN is created with the variables and labels found in the trajectory. (warning : if the trajectory is short, all of the variables may not be found correctly).

data

The samples.

Type:: Dict[int, List[Tuple[float, str, str]]]

ctbn

The CTBN used to link the names in the trajectory to pyAgrum variables.

Type:: CTBN

timeHorizon

The time length of the trajectory.

Type:: float

computeAllCIMs(): Computes the CIMs of the variables in self.ctbn. Conditioning is given by the graph of self.ctbn.

computeStats(X, U)

Computes time spent and number of transitions values of X and returns them as pyagrum.Tensor.

Parameters:

X (str) – Name of the variable.
U (List[str]) – List of conditioning variable’s name.

Returns:

The resulting tensors.

Return type:

Tuple[pyagrum.Tensor, pyagrum.Tensor]

computeStatsForTests(X, Y, U)

Computes time spent and number of transitions values of X when conditioned by Y and U and returns them as pyagrum.Tensor. Used for independence testing.

Parameters:

X (str) – Name of the variable.
Y (str) – Name of a conditioning variable not in U.
U (List[str]) – List of conditioning variable’s name.

Returns:

The resulting tensors.

Return type:

Tuple[pyagrum.Tensor, pyagrum.Tensor, pyagrum.Tensor]

setStatValues(X, inst_u, Txu, Mxu)

Fills the tensors given.

Parameters:

X (str) – Name of the variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state.
Mxu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states.

setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)

Fills the tensors given. They are used for independence testing.

Parameters:

X (str) – Name of the variable.
Y (str) – Name of a conditioning variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by variables in inst_u.
Txyu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by Y and variables in inst_u.
Mxyu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states. Conditioned by Y and variables in inst_u.

class pyagrum.ctbn.Stats(trajectory, X, Y, par)

Stores all tensors used for learning.

Parameters:

trajectory (Trajectory) – Samples used to find stats.
X (str) – Name of the variable to study.
Y (str) – Name of the variable used for conditioning variable X.
par (List[str]) – List of conditioning variables of X.

Mxy

Tensor containing the number of transitions the variable X does from any of its states for any instance of its parents and variable``Y``.

Type:: pyagrum.Tensor

Mx

Tensor containing the number of transitions the variable X does from any of its states for any instance of its parents.

Type:: pyagrum.Tensor

Tx

Tensor containing the time spent by X to transition from a state to another for any instance of its parents.

Type:: pyagrum.Tensor

Txy

Tensor containing the time spent by X to transition from a state to another for any instance of its parents and of Y.

Type:: pyagrum.Tensor

Qx

Conditional Intensity Matrix(CIM) of X.

Type:: pyagrum.Tensor

QxY

Conditional Intensity Matrix(CIM) of X that includes the conditioning variable Y.

Type:: pyagrum.Tensor

class pyagrum.ctbn.StatsIndepTest.FChi2Test(tr)

Bases: IndepTest

This class use 2 independence tests : Fisher Test (F-test) and chi2 Test. To test independence between 2 variables, we first consider them independent. There is independence until one of the 2 tests (F and chi2) contradict the independence hypothesis. If the hyopothesis is not rejected, the variables are considered independent.

Parameters:: tr (Trajectory) – Samples used to extract stats.

addVariables(X, Y, U)

Saves variables X and Y and the conditioning set U, and generates stats to be used in statistical tests.

Parameters:

X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in U.
U (List[str]) – List of conditioning variables.

computeChi2()

Compute chi2-test value for every instance of the variables.

Returns:: chi2-test value.
Return type:: pyagrum.Tensor

computeF()

Compute F-test value for every instance of the variables.

Returns:: F-test value.
Return type:: pyagrum.Tensor

getMxxGivenU(M, Y)

Parameters:

M (pyagrum.Tensor) – A matrix M_{x, x’ | y, U}, for some instantiation U of the conditioning set and y of a specific parent.
Y (str) – A parent.

Returns:

The tensor M_{x, x’ | U} by summing over all values of y.

Return type:

pyagrum.Tensor

nullStateToStateTransitionHypothesisChi2(X, Y, _)

Decides if the null state to state transition hypothesis is rejected using chi2-test.

Parameters:

X (str) – A random variable.
Y (str) – A parent of X.
_ (List[str]) – A subset of the parents of X that does not contain Y.
_

Returns:

False if X is not independent of Y given the conditioning set U.

Return type:

bool

nullTimeToTransitionHypothesisF(X, Y, _)

Decides if the null time to transition hypothesis is rejected using F-test.

Parameters:

X (str) – A random variable.
Y (str) – A parent of X.
_ (List[str]) – A subset of the parents of X that does not contain Y.
_

Returns:

False if X is not independent of Y given the conditioning set U.

Return type:

bool

testIndep(X, Y, U)

Parameters:

X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in U.
U (List[str]) – List of conditioning variables.

Returns:

true if X is independent to Y given U, otherwise false.

Return type:

bool

class pyagrum.ctbn.StatsIndepTest.IndepTest

Bases: object

Mother class used to test independance between 2 variables knowing some other parents.

abstract testIndep(X, Y, U)

Parameters:

X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.

Return type:

bool

class pyagrum.ctbn.StatsIndepTest.Oracle(ctbn)

Bases: IndepTest

Oracle’s testing tools.

Parameters:: ctbn (CTBN)

testIndep(X, Y, U)

Parameters:

X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.

Returns:

False if there is an arc from Y to X knowing U, True otherwise.

Return type:

bool

pyagrum.ctbn.StatsIndepTest.sqrtTensor(tensor)

Applies sqrt function to all values inside the tensor.

Parameters:: tensor (pyagrum.Tensor) – tensor to play sqrt to.
Returns:: sqrt of tensor.
Return type:: pyagrum.Tensor

Bibliography for CTNB

[BSS20]

Alessandro Bregoli, Marco Scutari, and Fabio Stella. Constraing-based learning for continous-time bayesian networks. In Manfred Jaeger and Thomas Dyhre Nielsen, editors, Proceedings of the 10th International Conference on Probabilistic Graphical Models, volume 138 of Proceedings of Machine Learning Research, 41–52. PMLR, 23–25 Sep 2020. URL: https://proceedings.mlr.press/v138/bregoli20a.html.

[NSK02]

Uri Nodelman, Christian R. Shelton, and Daphne Koller. Learning continuous time bayesian networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI'03, 451–458. San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.