Generation of database
- class pyAgrum.BNDatabaseGenerator(bn)
BNDatabaseGenerator is used to easily generate databases from a pyAgrum.BayesNet.
- Parameters:
bn (pyAgrum.BayesNet) – the Bayesian network used to generate data.
- bn()
Get the Bayesian network used to generate the samples
- Returns:
The Bayesian network
- Return type:
- drawSamples(*args)
Generate and stock a database generated by sampling the Bayesian network.
If evs is specified, the samples are stored only if there are compatible with these observations.
Returns the log2likelihood of this database.
- Parameters:
nbSamples (int) – the number of samples that will be generated
evs ("pyAgrum.Instantiation" or Dict[intstr,intstr]) – (optional) The evidence that will be observed by the resulting samples.
- Return type:
float
Warning
nbSamples is not the size of the database but the number of generated samples. It may happen that the evidence is very rare (or even impossible). In that cas the generated database may have only a few samples (even it may be empty).
Examples
>>> import pyAgrum as gum >>> bn=pyAgrum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F<-B') >>> g=pyAgrum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(100,{'B':'yes','E':'1'}) -233.16554130404904 >>> g.to_pandas() D E C B F A 0 1 1 0 yes 1 1 1 1 1 0 yes 1 0 2 1 1 1 yes 0 1 3 1 1 0 yes 0 0 4 1 1 0 yes 0 1 5 1 1 0 yes 1 0 6 1 1 0 yes 0 0 7 0 1 1 yes 1 1 8 1 1 0 yes 0 1 9 0 1 0 yes 1 1 10 1 1 0 yes 1 1
- log2likelihood()
Get the log2likelihood of the generated database
- Raises:
pyAgrum.OperationNotAllowed – if nothing has been sampled yet (using pyAgrum.BNDatabaseGenerator.drawSamples() for instance)
- Returns:
the log2likelihood
- Return type:
float
- samplesAt(row, col)
Get the value of the database in (row,col)
- Parameters:
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns:
the index of the modality of the variable in this position
- Return type:
int
- samplesLabelAt(row, col)
Get the label of the database in (row,col)
- Parameters:
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns:
the label of the modality of the variable in this position
- Return type:
str
- samplesNbCols()
return the number of columns in the samples
- Return type:
int
- samplesNbRows()
return the number of rows in the samples
- Return type:
int
- setAntiTopologicalVarOrder()
Select an anti-topological order for the variables in the database.
- Return type:
None
- setRandomVarOrder()
Select an random order for the variables in the database.
- Return type:
None
- setTopologicalVarOrder()
Select a topological order for the variables in the database.
- Return type:
None
- setVarOrder(*args)
Set a specific order with a list of names
- Parameters:
vars (List[str]) – order specified by the list of variable names.
- Return type:
None
- setVarOrderFromCSV(*args)
Set the same order than in a csv file
- Parameters:
filename (str) – the name of the CSV file
- Return type:
None
- toCSV(*args)
generates csv representing the generated database.
- Parameters:
csvFilename (str) – the name of the csv file
useLabels (bool) – whether label or id in the csv file (default true)
append (bool) – append in the file or rewrite the file (default false)
csvSeparator (str) – separator in the csv file (default ‘,’)
- Return type:
None
- to_pandas(with_labels=True)
export the samples as a pandas.DataFrame.
- Parameters:
with_labels (bool) – is the DataFrame full of labels of variables or full of index of labels of variables
- varOrder()
The actual order for the variable (as a tuple of NodeId)
- Returns:
the tuple of NodeId
- Return type:
Tuple[int]
- varOrderNames()
The actual order for the variable (as a tuple of NodeId)
- Returns:
the tuple of names
- Return type:
Tuple[str]