Generation of database
- class pyagrum.BNDatabaseGenerator(bn)
BNDatabaseGenerator is used to easily generate databases from a pyagrum.BayesNet.
- Parameters:
bn (pyagrum.BayesNet) – the Bayesian network used to generate data.
- bn()
Get the Bayesian network used to generate the samples
- Returns:
The Bayesian network
- Return type:
- drawSamples(*args)
Generate and stock a database generated by sampling the Bayesian network.
If evs is specified, the samples are stored only if there are compatible with these observations.
Returns the log2likelihood of this database.
- Parameters:
nbSamples (int) – the number of samples that will be generated
evs ("pyagrum.Instantiation" or Dict[intstr,intstr]) – (optional) The evidence that will be observed by the resulting samples.
timeout (int) – (optional) The maximum time in seconds to generate the samples (default 600)
- Return type:
float
Warning
nbSamples is not the number of generated samples but the size of the database.It may happen that the evidence is very rare (or even impossible). In this case, the generation process may be very slow (it may even not stop). For this case a timeout is provided (default 600 seconds) and then the size of the database can be smaller than nbSamples (even equal to 0).
Warning
For discretized variable, aGrum/pyAgrum defines 3 behaviors when generating sample with labels : - RANDOM (default) : the value is chosen randomly in the interval - MEDIAN : the value is the median of the interval - INTERVAL : the value is the interval itself (for instance « [0,1[ »)
The behavior can be set using setDiscretizedLabelMode{Random|Median|Interval}.
Examples
>>> import pyagrum as gum >>> bn=pyagrum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F[1,1.5,3,10.2]<-B') >>> g=pyagrum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(5, ... {'B':'yes','E':'1'}) -122.98754206579288 >>> g.setDiscretizedLabelModeRandom() # By default >>> g.to_pandas() B F A C E D 0 yes 2.802302 0 0 1 0 1 yes 1.761605 0 0 1 0 2 yes 2.507535 0 0 1 1 3 yes 2.815282 0 1 1 0 4 yes 5.548571 1 0 1 1 >>> g.setDiscretizedLabelModeMedian() >>> g.to_pandas() B F A C E D 0 yes 2.250000 0 0 1 0 1 yes 2.250000 0 0 1 0 2 yes 2.250000 0 0 1 1 3 yes 2.250000 0 1 1 0 4 yes 6.600000 1 0 1 1 >>> g.setDiscretizedLabelModeInterval() >>> g.to_pandas() B F A C E D 0 yes [1.5;3[ 0 0 1 0 1 yes [1.5;3[ 0 0 1 0 2 yes [1.5;3[ 0 0 1 1 3 yes [1.5;3[ 0 1 1 0 4 yes [3;10.2] 1 0 1 1
- log2likelihood()
Get the log2likelihood of the generated database
- Raises:
pyagrum.OperationNotAllowed – if nothing has been sampled yet (using pyagrum.BNDatabaseGenerator.drawSamples() for instance)
- Returns:
the log2likelihood
- Return type:
float
- samplesAt(row, col)
Get the value of the database in (row,col)
- Parameters:
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns:
the index of the modality of the variable in this position
- Return type:
int
- samplesLabelAt(row, col)
Get the label of the database in (row,col)
- Parameters:
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns:
the label of the modality of the variable in this position
- Return type:
str
- samplesNbCols()
return the number of columns in the samples
- Return type:
int
- samplesNbRows()
return the number of rows in the samples
- Return type:
int
- setAntiTopologicalVarOrder()
Select an anti-topological order for the variables in the database.
- Return type:
None
- setDiscretizedLabelModeInterval()
Set the discretized label mode to INTERVAL : sampling a pyagrum.discretizedVariable will give a deterministic value : the string representation of the interval.
- Return type:
None
Examples
>>> import pyagrum as gum >>> bn=pyagrum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F[1,1.5,3,10.2]<-B') >>> g=pyagrum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(5, ... {'B':'yes','E':'1'}) -122.98754206579288 >>> g.setDiscretizedLabelModeInterval() >>> g.to_pandas() B F A C E D 0 yes [1.5;3[ 0 0 1 0 1 yes [1.5;3[ 0 0 1 0 2 yes [1.5;3[ 0 0 1 1 3 yes [1.5;3[ 0 1 1 0 4 yes [3;10.2] 1 0 1 1
- setDiscretizedLabelModeMedian()
Set the discretized label mode to MEDIAN : sampling a pyagrum.discretizedVariable will give a deterministic value : the median of the uniform distribution on that interval.
- Return type:
None
Examples
>>> import pyagrum as gum >>> bn=pyagrum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F[1,1.5,3,10.2]<-B') >>> g=pyagrum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(5, ... {'B':'yes','E':'1'}) -122.98754206579288 >>> g.setDiscretizedLabelModeMedian() >>> g.to_pandas() B F A C E D 0 yes 2.250000 0 0 1 0 1 yes 2.250000 0 0 1 0 2 yes 2.250000 0 0 1 1 3 yes 2.250000 0 1 1 0 4 yes 6.600000 1 0 1 1
- setDiscretizedLabelModeRandom()
Set the discretized label mode to RANDOM (default mode) : sampling a pyagrum.discretizedVariable will give a random value from the uniform distribution on that interval.
- Return type:
None
Examples
>>> import pyagrum as gum >>> bn=pyagrum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F[1,1.5,3,10.2]<-B') >>> g=pyagrum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(5, ... {'B':'yes','E':'1'}) -122.98754206579288 >>> g.setDiscretizedLabelModeRandom() # By default >>> g.to_pandas() B F A C E D 0 yes 2.802302 0 0 1 0 1 yes 1.761605 0 0 1 0 2 yes 2.507535 0 0 1 1 3 yes 2.815282 0 1 1 0 4 yes 5.548571 1 0 1 1
- setRandomVarOrder()
Select an random order for the variables in the database.
- Return type:
None
- setTopologicalVarOrder()
Select a topological order for the variables in the database.
- Return type:
None
- setVarOrder(*args)
Set a specific order with a list of names
- Parameters:
vars (List[str]) – order specified by the list of variable names.
- Return type:
None
- setVarOrderFromCSV(*args)
Set the same order than in a csv file
- Parameters:
filename (str) – the name of the CSV file
- Return type:
None
- toCSV(*args)
generates csv representing the generated database.
- Parameters:
csvFilename (str) – the name of the csv file
useLabels (bool) – whether label or id in the csv file (default true)
append (bool) – append in the file or rewrite the file (default false)
csvSeparator (str) – separator in the csv file (default ‘,’)
- Return type:
None
- to_pandas(with_labels=True)
export the samples as a pandas.DataFrame.
- Parameters:
with_labels (bool) – is the DataFrame full of labels of variables or full of index of labels of variables
- varOrder()
The actual order for the variable (as a tuple of NodeId)
- Returns:
the tuple of NodeId
- Return type:
Tuple[int]
- varOrderNames()
The actual order for the variable (as a tuple of NodeId)
- Returns:
the tuple of names
- Return type:
Tuple[str]