biogeme.database.container module¶
DataContainer: Responsible for holding and safely manipulating the Biogeme dataset stored as a Pandas DataFrame.
Michel Bierlaire Wed Mar 26 19:30:57 2025
- class biogeme.database.container.Database(name, dataframe, use_jit=True)[source]¶
Bases:
object
Encapsulates a pandas DataFrame for Biogeme, providing safe access and basic operations such as checking for emptiness, scaling, and column manipulation.
- Parameters:
name (str)
dataframe (pd.DataFrame)
use_jit (bool)
- DefineVariable(name, expression)[source]¶
Warning
This function is deprecated. Use
define_variable()
instead.This method evaluates a Biogeme expression row by row on the database and creates a new column in the internal DataFrame with the results.
- Parameters:
name (
str
) – Name of the new column to be added.expression (
Expression
) – Biogeme expression to evaluate for each row.
- Return type:
- add_column(column, values)[source]¶
Adds a new column to the dataset
- Parameters:
column (
str
) – name of the new columnvalues (
Series
) – a pandas Series of same length as data
- Raises:
ValueError – if column already exists or lengths mismatch
- Return type:
None
- column_exists(column)[source]¶
Check if a column exists in the data
- Return type:
bool
- Parameters:
column (str)
- property data_jax: Array¶
Returns the data as a biogeme_jax object
- property dataframe: DataFrame¶
Returns a reference to the internal DataFrame.
- define_variable(name, expression)[source]¶
This method evaluates a Biogeme expression row by row on the database and creates a new column in the internal DataFrame with the results.
- Parameters:
name (
str
) – Name of the new column to be added.expression (
Expression
) – Biogeme expression to evaluate for each row.
- Return type:
- extract_rows(rows)[source]¶
Extracts selected rows fronm the database.
- Parameters:
rows (
list
[int
]) – list of rows to extract- Return type:
- Returns:
the new database with the selected rows.
- extract_slice(indices)[source]¶
Create a new Database instance containing only a subset of the data.
This is useful to maintain consistency across estimation and validation datasets by slicing the original draws array according to the provided indices.
- Parameters:
indices (
Index
) – The indices used to extract the subset of draws.- Return type:
- Returns:
A new Database instance containing the sliced draws.
- generate_segmentation(variable, mapping=None, reference=None)[source]¶
Generate a segmentation tuple for a variable.
- Parameters:
variable (
Variable
|str
) – Variable object or name of the variablemapping (
dict
[int
,str
] |None
) – mapping associating values of the variable to names. If incomplete, default names are provided.reference (
str
|None
) – name of the reference category. If None, an arbitrary category is selected as reference.
- Return type:
- get_column(column)[source]¶
Returns the values of a column
- Return type:
Series
- Parameters:
column (str)
- remove(exclude_condition)[source]¶
Removes rows from the database that satisfy a given condition.
This method evaluates a Biogeme expression row by row on the database. All rows where the expression evaluates to a truthy value are removed.
- Parameters:
exclude_condition (
Expression
|float
|int
|bool
) – A Biogeme expression that returns a boolean-like value for each row in the dataset. Rows where the result is True (nonzero) will be excluded.
- remove_rows(condition)[source]¶
Removes all rows where the condition is True
- Parameters:
condition (
Series
) – Boolean Series of same length as the data
- scale_column(column, scale)[source]¶
Scales all values in a given column
- Parameters:
column (
str
) – name of the column to scalescale (
float
) – scalar to multiply the column values by
- Raises:
BiogemeError – if the column is not found
- suggest_scaling(columns=None, report_all=False)[source]¶
Suggest a scaling of the variables in the database.
For each column, \(\delta\) is the difference between the largest and the smallest value, or one if the difference is smaller than one. The level of magnitude is evaluated as a power of 10. The suggested scale is the inverse of this value.
\[s = \frac{1}{10^{|\log_{10} \delta|}}\]where \(|x|\) is the integer closest to \(x\).
- Parameters:
columns (
list
[str
] |None
) – list of columns to be considered. If None, all of them will be considered.report_all (
bool
) – if False, remove entries where the suggested scale is 1, 0.1 or 10
- Return type:
DataFrame
- Returns:
A Pandas dataframe where each row contains the name of the variable and the suggested scale s. Ideally, the column should be multiplied by s.
- Raises:
BiogemeError – if a variable in
columns
is unknown.
- verify_segmentation(segmentation)[source]¶
Verifies if the definition of the segmentation is consistent with the data
- Parameters:
segmentation (
DiscreteSegmentationTuple
) – definition of the segmentation- Raises:
BiogemeError – if the segmentation is not consistent with the data.
- Return type:
None
- biogeme.database.container.logger = <Logger biogeme.database.container (DEBUG)>¶
Logger that controls the output of messages to the screen and log file.