Learning a CTBN
One of the main features of this library is the possibility to learn a CTBN.
- More precisely what can be learned is :
The dependency graph of a CTBN
The CIMs of a CTBN
(The variables and their labels from a sample)
Tools to extract data from samples are necessary. This is the role of class pyAgrum.ctbn.Trajectory
and function pyAgrum.ctbn.CTBNFromData()
.
- Before introducing the algorithms, here are the following definitions :
\(M_{xx'|u}\) is the number of time a variable X go from a state x to a state x’, conditioned by an instance of its parents u. It is filled using samples.
\(M_{x|u}\) is the number of time X goes to state x.
\(T_{x|u}\) is the time spent in state x, conditioned by an instance of its parents u.
\(M_{xx'|y,u}\) and \(T_{x|y,u}\) are the same but with another conditioning variable Y in state y.
Those can be stored in pyAgrum.Potential
.
Being conditioned by an instance means that the extracted data comes from time intervals where conditioning variables take specific values.
Learning parameters : learning the CIMs
Goal : finding the \(q_{i,j|u}\) (i.e \(q_{x|u}\) and \(q_{x \rightarrow x'|u}\)) coefficients.
Idea : \(q_{x|u}\) = \(\frac{M_{x|u}}{T_{x|u}}\); \(P_X(x\rightarrow x') = \frac{M_{x \rightarrow x'|u}}{M_{x|u}} = \frac{q_{x \rightarrow x'|u}}{q_{x|u}}\) Then \(q_{x \rightarrow x'|u} = \frac{M_{x \rightarrow x'|u}}{T_{x|u}}\)
Learning the graph
To learn the graph of a CTBN (ie the dependence between variables) we use the CTPC algorithm from A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020. The independence test used is based on Fisher and chi2 tests to compare exponential distributions.
- class pyAgrum.ctbn.Learner(source)
Class used to learn a CTBN (independence between variables and CIMs) using samples.
- Parameters:
source (str|Dict[int, List[Tuple[float, str, str]]]) – Path to the csv file containing the samples(trajectories). Or directly the trajectories in a python dict.
- fitParameters(ctbn)
Learns the parameters of
ctbn
’s CIMs.- Parameters:
ctbn (CTBN) – CTBN containing the CIMs to learn.
- learnCTBN(template=None)
Learns a CTBN, using the CTPC(continuous-time PC) algorithm. Reference : A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020.
- pyAgrum.ctbn.readTrajectoryCSV(filename)
Reads trajectories from a csv file. Storing format : {IdSample, time, var, state}
- Parameters:
filename (str) – Path to the file.
- Returns:
The trajectories, a trajectory for every index.
- Return type:
Dict[int, List[Tuple[float, str, str]]]
- pyAgrum.ctbn.CTBNFromData(data)
Constructs a CTBN and add the corresponding variables found in the trajectories.
Warning
If data is too short, some variables or state labels might be missed.
- Parameters:
data (Dict[int, List[Tuple[float, str, str]]]) – The trajectories used to look for variables.
- Returns:
The resulting CTBN.
- Return type:
- pyAgrum.ctbn.computeCIMFromStats(X, M, T)
Computes a CIM (Conditional Intensity Matrix) using stats from a trajectory. Variables in the potential are not copied but directly used in the result to avoid memory issues.
- Parameters:
X (str) – Name of the variable to compute CIM for.
M (pyAgrum.Potential) – Potential containing the number of transitions for each pair of
X
’s states.T (pyAgrum.Potential) – Potential containing the time spent to transition from every state of
X
.
- Returns:
The resulting potential,
X
’s CIM.- Return type:
- class pyAgrum.ctbn.Trajectory(source, ctbn=None)
Tools to extract useful informations from a trajectory. It is used for parameters/graph learning. It can be created from a trajectory (a dict of trajectories) or from a file that contains one.
- Parameters:
source (str|Dict[int, List[Tuple[float, str, str]]]) – The path to a csv file containing the samples or the dict of trajectories itself.
ctbn (CTBN) – To link the variables’s name in the trajectory to their pyAgrum variable. If not given, a new CTBN is created with the variables and labels found in the trajectory. (warning : if the trajectory is short, all of the variables may not be found correctly).
- data
The samples.
- Type:
Dict[int, List[Tuple[float, str, str]]]
- timeHorizon
The time length of the trajectory.
- Type:
float
- computeAllCIMs()
Computes the CIMs of the variables in
self.ctbn
. Conditioning is given by the graph ofself.ctbn
.
- computeStats(X, U)
Computes time spent and number of transitions values of
X
and returns them aspyAgrum.Potential
.- Parameters:
X (str) – Name of the variable.
U (List[str]) – List of conditioning variable’s name.
- Returns:
The resulting potentials.
- Return type:
Tuple[pyAgrum.Potential, pyAgrum.Potential]
- computeStatsForTests(X, Y, U)
Computes time spent and number of transitions values of
X
when conditioned byY
andU
and returns them aspyAgrum.Potential
. Used for independence testing.- Parameters:
X (str) – Name of the variable.
Y (str) – Name of a conditioning variable not in
U
.U (List[str]) – List of conditioning variable’s name.
- Returns:
The resulting potentials.
- Return type:
Tuple[pyAgrum.Potential, pyAgrum.Potential, pyAgrum.Potential]
- setStatValues(X, inst_u, Txu, Mxu)
Fills the potentials given.
- Parameters:
X (str) – Name of the variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state.
Mxu (pyAgrum.Potential) – Potential to fill. Contains the number of transitions from any pair of states.
- setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)
Fills the potentials given. They are used for independence testing.
- Parameters:
X (str) – Name of the variable.
Y (str) – Name of a conditioning variable.
inst_u (Dict[str, str]) – Instance of conditioning variables.
Txu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state. Conditioned by variables in
inst_u
.Txyu (pyAgrum.Potential) – Potential to fill. Contains the time spent in each state. Conditioned by
Y
and variables ininst_u
.Mxyu (pyAgrum.Potential) – Potential to fill. Contains the number of transitions from any pair of states. Conditioned by
Y
and variables ininst_u
.
- class pyAgrum.ctbn.Stats(trajectory, X, Y, par)
Stores all potentials used for learning.
- Parameters:
trajectory (Trajectory) – Samples used to find stats.
X (str) – Name of the variable to study.
Y (str) – Name of the variable used for conditioning variable
X
.par (List[str]) – List of conditioning variables of
X
.
- Mxy
Potential containing the number of transitions the variable
X
does from any of its states for any instance of its parents and variable``Y``.- Type:
- Mx
Potential containing the number of transitions the variable
X
does from any of its states for any instance of its parents.- Type:
- Tx
Potential containing the time spent by
X
to transition from a state to another for any instance of its parents.- Type:
- Txy
Potential containing the time spent by
X
to transition from a state to another for any instance of its parents and ofY
.- Type:
- Qx
Conditional Intensity Matrix(CIM) of
X
.- Type:
- QxY
Conditional Intensity Matrix(CIM) of
X
that includes the conditioning variableY
.- Type:
- class pyAgrum.ctbn.StatsIndepTest.FChi2Test(tr)
Bases:
IndepTest
This class use 2 independence tests : Fisher Test (F-test) and chi2 Test. To test independence between 2 variables, we first consider them independent. There is independence until one of the 2 tests (F and chi2) contradict the independence hypothesis. If the hyopothesis is not rejected, the variables are considered independent.
- Parameters:
tr (Trajectory) – Samples used to extract stats.
- addVariables(X, Y, U)
Saves variables
X
andY
and the conditioning setU
, and generates stats to be used in statistical tests.- Parameters:
X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in
U
.U (List[str]) – List of conditioning variables.
- computeChi2()
Compute chi2-test value for every instance of the variables.
- Returns:
chi2-test value.
- Return type:
- computeF()
Compute F-test value for every instance of the variables.
- Returns:
F-test value.
- Return type:
- getMxxGivenU(M, Y)
- Parameters:
M (pyAgrum.Potential) – A matrix M_{x, x’ | y, U}, for some instantiation U of the conditioning set and y of a specific parent.
Y (str) – A parent.
- Returns:
The potential M_{x, x’ | U} by summing over all values of y.
- Return type:
- nullStateToStateTransitionHypothesisChi2(X, Y, _)
Decides if the null state to state transition hypothesis is rejected using chi2-test.
- Parameters:
X (str) – A random variable.
Y (str) – A parent of
X
._ (
List
[str
]) – A subset of the parents ofX
that does not containY
._
- Returns:
False if
X
is not independent ofY
given the conditioning setU
.- Return type:
bool
- nullTimeToTransitionHypothesisF(X, Y, _)
Decides if the null time to transition hypothesis is rejected using F-test.
- Parameters:
X (str) – A random variable.
Y (str) – A parent of
X
._ (
List
[str
]) – A subset of the parents ofX
that does not containY
._
- Returns:
False if
X
is not independent ofY
given the conditioning setU
.- Return type:
bool
- testIndep(X, Y, U)
- Parameters:
X (str) – Name of the variable.
Y (str) – Name of the variable to test independence from, not in
U
.U (List[str]) – List of conditioning variables.
- Returns:
true if
X
is independent toY
givenU
, otherwise false.- Return type:
bool
- class pyAgrum.ctbn.StatsIndepTest.IndepTest
Bases:
object
Mother class used to test independance between 2 variables knowing some other parents.
- abstract testIndep(X, Y, U)
- Parameters:
X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.
- Return type:
bool
- class pyAgrum.ctbn.StatsIndepTest.Oracle(ctbn)
Bases:
IndepTest
Oracle’s testing tools.
- Parameters:
ctbn (CTBN)
- testIndep(X, Y, U)
- Parameters:
X (str) – Head of the arc we want to test.
Y (str) – Tail of the arc we want to test.
U (List[str]) – Known parents.
- Returns:
False if there is an arc from Y to X knowing U, True otherwise.
- Return type:
bool
- pyAgrum.ctbn.StatsIndepTest.sqrtPotential(potential)
Applies sqrt function to all values inside the potential.
- Parameters:
potential (pyAgrum.Potential) – potential to play sqrt to.
- Returns:
sqrt of potential.
- Return type: