Generation of database

class pyAgrum.BNDatabaseGenerator(bn)

BNDatabaseGenerator is used to easily generate databases from a pyAgrum.BayesNet.

Parameters

bn (pyAgrum.BayesNet) – the Bayesian network used to generate data.

bn()

Get the Bayesian network used to generate the samples

Returns

The Bayesian network

Return type

pyAgrum.BayesNet

drawSamples(*args)

Generate and stock a database generated by sampling the Bayesian network.

If evs is specified, the samples are stored only if there are compatible with these observations.

Returns the log2likelihood of this database.

Parameters
  • nbSamples (int) – the number of samples that will be generated

  • evs ("pyAgrum.Instantiation" or Dict[intstr,intstr]) – (optional) The evidence that will be observed by the resulting samples.

Return type

float

Warning

nbSamples is not the size of the database but the number of generated samples. It may happen that the evidence is very rare (or even impossible). In that cas the generated database may have only a few samples (even it may be empty).

Examples

>>> import pyAgrum as gum
>>> bn=gum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F<-B')
>>> g=gum.BNDatabaseGenerator(bn)
>>> g.setRandomVarOrder()
>>> g.drawSamples(100,{'B':'yes','E':'1'})
-233.16554130404904
>>> g.to_pandas()
    D  E  C    B  F  A
0   1  1  0  yes  1  1
1   1  1  0  yes  1  0
2   1  1  1  yes  0  1
3   1  1  0  yes  0  0
4   1  1  0  yes  0  1
5   1  1  0  yes  1  0
6   1  1  0  yes  0  0
7   0  1  1  yes  1  1
8   1  1  0  yes  0  1
9   0  1  0  yes  1  1
10  1  1  0  yes  1  1
log2likelihood()

Get the log2likelihood of the generated database

Raises

pyAgrum.OperationNotAllowed – if nothing has been sampled yet (using gum.BNDatabaseGenerator.drawSamples() for instance)

Returns

the log2likelihood

Return type

float

samplesAt(row, col)

Get the value of the database in (row,col)

Parameters
  • row (int) – the row

  • col (int) – the column (using the ordered list of variables)

Returns

the index of the modality of the variable in this position

Return type

int

samplesLabelAt(row, col)

Get the label of the database in (row,col)

Parameters
  • row (int) – the row

  • col (int) – the column (using the ordered list of variables)

Returns

the label of the modality of the variable in this position

Return type

str

samplesNbCols()

return the number of columns in the samples

Return type

int

samplesNbRows()

return the number of rows in the samples

Return type

int

setAntiTopologicalVarOrder()

Select an anti-topological order for the variables in the database.

Return type

None

setRandomVarOrder()

Select an random order for the variables in the database.

Return type

None

setTopologicalVarOrder()

Select a topological order for the variables in the database.

Return type

None

setVarOrder(*args)

Set a specific order with a list of names

Parameters

vars (List[str]) – order specified by the list of variable names.

Return type

None

setVarOrderFromCSV(*args)

Set the same order than in a csv file

Parameters

filename (str) – the name of the CSV file

Return type

None

toCSV(*args)

generates csv representing the generated database.

Parameters
  • csvFilename (str) – the name of the csv file

  • useLabels (bool) – whether label or id in the csv file (default true)

  • append (bool) – append in the file or rewrite the file (default false)

  • csvSeparator (str) – separator in the csv file (default ‘,’)

Return type

None

to_pandas(with_labels=True)

export the samples as a pandas.DataFrame.

Parameters

with_labels (bool) – is the DataFrame full of labels of variables or full of index of labels of variables

varOrder()

The actual order for the variable (as a tuple of NodeId)

Returns

the tuple of NodeId

Return type

Tuple[int]

varOrderNames()

The actual order for the variable (as a tuple of NodeId)

Returns

the tuple of names

Return type

Tuple[str]