Learning a CTBN
One of the main features of this library is the possibility to learn a CTBN.
- More precisely what can be learned is :
The dependency graph of a CTBN
The CIMs of a CTBN
(The variables and their labels from a sample)
Tools to extract data from samples are necessary. This is the role of class pyagrum.ctbn.Trajectory
and function pyagrum.ctbn.CTBNFromData()
.
- Before introducing the algorithms, here are the following definitions :
\(M_{xx'|u}\) is the number of time a variable X go from a state x to a state x’, conditioned by an instance of its parents u. It is filled using samples.
\(M_{x|u}\) is the number of time X goes to state x.
\(T_{x|u}\) is the time spent in state x, conditioned by an instance of its parents u.
\(M_{xx'|y,u}\) and \(T_{x|y,u}\) are the same but with another conditioning variable Y in state y.
Those can be stored in pyagrum.Tensor
.
Being conditioned by an instance means that the extracted data comes from time intervals where conditioning variables take specific values.
Learning parameters : learning the CIMs
Goal : finding the \(q_{i,j|u}\) (i.e \(q_{x|u}\) and \(q_{x \rightarrow x'|u}\)) coefficients.
Idea : \(q_{x|u}\) = \(\frac{M_{x|u}}{T_{x|u}}\); \(P_X(x\rightarrow x') = \frac{M_{x \rightarrow x'|u}}{M_{x|u}} = \frac{q_{x \rightarrow x'|u}}{q_{x|u}}\) Then \(q_{x \rightarrow x'|u} = \frac{M_{x \rightarrow x'|u}}{T_{x|u}}\)
Learning the graph
To learn the graph of a CTBN (ie the dependence between variables) we use the CTPC algorithm from Bregoli et al. [BSS20] (and using Nodelman et al. [NSK02]). The independence test used is based on Fisher and chi2 tests to compare exponential distributions.
- class pyagrum.ctbn.Learner(source)
Class used to learn a CTBN (independence between variables and CIMs) using samples.
- Parameters:
source (str|Dict[int, List[Tuple[float, str, str]]]) – Path to the csv file containing the samples(trajectories). Or directly the trajectories in a python dict.
- fitParameters(ctbn)
Learns the parameters of
ctbn
’s CIMs.- Parameters:
ctbn (CTBN) – CTBN containing the CIMs to learn.
- learnCTBN(template=None)
Learns a CTBN, using the CTPC(continuous-time PC) algorithm. Reference : A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020.
- pyagrum.ctbn.readTrajectoryCSV(filename)
Reads trajectories from a csv file. Storing format : {IdSample, time, var, state}
- Parameters:
filename (str) – Path to the file.
- Returns:
The trajectories, a trajectory for every index.
- Return type:
Dict[int, List[Tuple[float, str, str]]]
- pyagrum.ctbn.CTBNFromData(data)
Constructs a CTBN and add the corresponding variables found in the trajectories.
Warning
If data is too short, some variables or state labels might be missed.
- Parameters:
data (Dict[int, List[Tuple[float, str, str]]]) – The trajectories used to look for variables.
- Returns:
The resulting CTBN.
- Return type:
- pyagrum.ctbn.computeCIMFromStats(X, M, T)
Computes a CIM (Conditional Intensity Matrix) using stats from a trajectory. Variables in the tensor are not copied but directly used in the result to avoid memory issues.
- Parameters:
X (str) – Name of the variable to compute CIM for.
M (pyagrum.Tensor) – Tensor containing the number of transitions for each pair of
X
’s states.T (pyagrum.Tensor) – Tensor containing the time spent to transition from every state of
X
.
- Returns:
The resulting tensor,
X
’s CIM.- Return type:
- class pyagrum.ctbn.Trajectory(source, ctbn=None)
Tools to extract useful informations from a trajectory. It is used for parameters/graph learning. It can be created from a trajectory (a dict of trajectories) or from a file that contains one.
- Parameters:
source (str|Dict[int, List[Tuple[float, str, str]]]) – The path to a csv file containing the samples or the dict of trajectories itself.
ctbn (CTBN) – To link the variables’s name in the trajectory to their pyAgrum variable. If not given, a new CTBN is created with the variables and labels found in the trajectory. (warning : if the trajectory is short, all of the variables may not be found correctly).
- data
The samples.
- Type:
Dict[int, List[Tuple[float, str, str]]]
- timeHorizon
The time length of the trajectory.
- Type:
float
- computeAllCIMs()
Computes the CIMs of the variables in
self.ctbn
. Conditioning is given by the graph ofself.ctbn
.
- computeStats(X, U)
Computes time spent and number of transitions values of
X
and returns them aspyagrum.Tensor
.- Parameters:
X (str) – Name of the variable.
U (List[str]) – List of conditioning variable’s name.
- Returns:
The resulting tensors.
- Return type:
Tuple[pyagrum.Tensor, pyagrum.Tensor]
- computeStatsForTests(X, Y, U)
Computes time spent and number of transitions values of
X
when conditioned byY
andU
and returns them aspyagrum.Tensor
. Used for independence testing.- Parameters:
X (str) – Name of the variable.
Y (str) – Name of a conditioning variable not in
U
.U (List[str]) – List of conditioning variable’s name.
- Returns:
The resulting tensors.
- Return type:
Tuple[pyagrum.Tensor, pyagrum.Tensor, pyagrum.Tensor]
- setStatValues(X, inst_u, Txu, Mxu)
Fills the tensors given.
- Parameters:
X (str) – Name of the variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state.
Mxu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states.
- setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)
Fills the tensors given. They are used for independence testing.
- Parameters:
X (str) – Name of the variable.
Y (str) – Name of a conditioning variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by variables in
inst_u
.Txyu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by
Y
and variables ininst_u
.Mxyu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states. Conditioned by
Y
and variables ininst_u
.
- class pyagrum.ctbn.Stats(trajectory, X, Y, par)
Stores all tensors used for learning.
- Parameters:
trajectory (Trajectory) – Samples used to find stats.
X (str) – Name of the variable to study.
Y (str) – Name of the variable used for conditioning variable
X
.par (List[str]) – List of conditioning variables of
X
.
- Mxy
Tensor containing the number of transitions the variable
X
does from any of its states for any instance of its parents and variable``Y``.- Type:
- Mx
Tensor containing the number of transitions the variable
X
does from any of its states for any instance of its parents.- Type:
- Tx
Tensor containing the time spent by
X
to transition from a state to another for any instance of its parents.- Type:
- Txy
Tensor containing the time spent by
X
to transition from a state to another for any instance of its parents and ofY
.- Type:
- Qx
Conditional Intensity Matrix(CIM) of
X
.- Type:
- QxY
Conditional Intensity Matrix(CIM) of
X
that includes the conditioning variableY
.- Type:
- class pyagrum.ctbn.StatsIndepTest.FChi2Test(tr)
Bases:
IndepTest
This class use 2 independence tests : Fisher Test (F-test) and chi2 Test. To test independence between 2 variables, we first consider them independent. There is independence until one of the 2 tests (F and chi2) contradict the independence hypothesis. If the hyopothesis is not rejected, the variables are considered independent.
- Parameters:
tr (Trajectory) – Samples used to extract stats.
- addVariables(X, Y, U)
Saves variables
X
andY
and the conditioning setU
, and generates stats to be used in statistical tests.- Parameters:
X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in
U
.U (List[str]) – List of conditioning variables.
- computeChi2()
Compute chi2-test value for every instance of the variables.
- Returns:
chi2-test value.
- Return type:
- computeF()
Compute F-test value for every instance of the variables.
- Returns:
F-test value.
- Return type:
- getMxxGivenU(M, Y)
- Parameters:
M (pyagrum.Tensor) – A matrix M_{x, x’ | y, U}, for some instantiation U of the conditioning set and y of a specific parent.
Y (str) – A parent.
- Returns:
The tensor M_{x, x’ | U} by summing over all values of y.
- Return type:
- nullStateToStateTransitionHypothesisChi2(X, Y, _)
Decides if the null state to state transition hypothesis is rejected using chi2-test.
- Parameters:
X (str) – A random variable.
Y (str) – A parent of
X
._ (
List
[str
]) – A subset of the parents ofX
that does not containY
._
- Returns:
False if
X
is not independent ofY
given the conditioning setU
.- Return type:
bool
- nullTimeToTransitionHypothesisF(X, Y, _)
Decides if the null time to transition hypothesis is rejected using F-test.
- Parameters:
X (str) – A random variable.
Y (str) – A parent of
X
._ (
List
[str
]) – A subset of the parents ofX
that does not containY
._
- Returns:
False if
X
is not independent ofY
given the conditioning setU
.- Return type:
bool
- testIndep(X, Y, U)
- Parameters:
X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in
U
.U (List[str]) – List of conditioning variables.
- Returns:
true if
X
is independent toY
givenU
, otherwise false.- Return type:
bool
- class pyagrum.ctbn.StatsIndepTest.IndepTest
Bases:
object
Mother class used to test independance between 2 variables knowing some other parents.
- abstract testIndep(X, Y, U)
- Parameters:
X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.
- Return type:
bool
- class pyagrum.ctbn.StatsIndepTest.Oracle(ctbn)
Bases:
IndepTest
Oracle’s testing tools.
- Parameters:
ctbn (CTBN)
- testIndep(X, Y, U)
- Parameters:
X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.
- Returns:
False if there is an arc from Y to X knowing U, True otherwise.
- Return type:
bool
- pyagrum.ctbn.StatsIndepTest.sqrtTensor(tensor)
Applies sqrt function to all values inside the tensor.
- Parameters:
tensor (pyagrum.Tensor) – tensor to play sqrt to.
- Returns:
sqrt of tensor.
- Return type:
Bibliography for CTNB
Alessandro Bregoli, Marco Scutari, and Fabio Stella. Constraing-based learning for continous-time bayesian networks. In Manfred Jaeger and Thomas Dyhre Nielsen, editors, Proceedings of the 10th International Conference on Probabilistic Graphical Models, volume 138 of Proceedings of Machine Learning Research, 41–52. PMLR, 23–25 Sep 2020. URL: https://proceedings.mlr.press/v138/bregoli20a.html.
Uri Nodelman, Christian R. Shelton, and Daphne Koller. Learning continuous time bayesian networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI'03, 451–458. San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.