biogeme.biogeme module

The core routines of Biogeme.

Implementation of the main Biogeme class

author:

Michel Bierlaire

date:

Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, **kwargs)[source]

Bases: object

Main class that combines the database and the model

specification.

It works in two modes: estimation and simulation.

The following attributes are imported from the parameter file.

Parameters:
  • database (Database)

  • formulas (Expression | dict[str, Expression])

  • random_number_generators (dict[str:RandomNumberGeneratorTuple] | None)

  • user_notes (str | None)

  • parameters (str | Parameters | None)

property algo_parameters: dict[str, bool | int | float | str]

Prepare the parameters for the optimization algorithm.

best_iteration

Store the best iteration found so far.

property bootstrap_samples
calculate_init_likelihood()[source]

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:

value of the log likelihood.

Return type:

float.

calculate_null_loglikelihood(avail)[source]

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:

avail (list of biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.

Returns:

value of the log likelihood

Return type:

float

property calculating_second_derivatives
change_init_values(betas)[source]

Modifies the initial values of the parameters in all formula

Parameters:

betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.

Return type:

None

check_derivatives(verbose=False)[source]

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:

verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

CheckDerivativesResults

Returns:

f, g, h, gdiff, hdiff where

  • f is the value of the function,

  • g is the analytical gradient,

  • h is the analytical hessian,

  • gdiff is the difference between the analytical and the finite differences gradient,

  • hdiff is the difference between the analytical and the finite differences hessian,

confidence_intervals(beta_values, interval_size=0.9)[source]

Calculate confidence intervals on the simulated quantities

Parameters:
  • beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.

  • interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = EstimationEResults.from_yaml_file(filename = 'my_model.yaml')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size = 100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple of two Pandas dataframes.

contains_log_likelihood()[source]
Return type:

bool

property dogleg
property enlarging_factor
estimate(starting_values=None, recycle=False, run_bootstrap=False, **kwargs)[source]

Estimate the parameters of the model(s).

Returns:

object containing the estimation results.

Return type:

biogeme.bioResults

Parameters:
  • starting_values (dict[str, float] | None)

  • recycle (bool)

  • run_bootstrap (bool)

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:
  • starting_values (dict[str, float] | None)

  • recycle (bool)

  • run_bootstrap (bool)

Return type:

EstimationResults

estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:
  • selected_configurations (set[Configuration]) – set of configurations. If None, all configurations are considered.

  • quick_estimate (bool) – if True, the final statistics are not calculated.

  • recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Return type:

dict[str, EstimationResults]

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

property expressions_registry
property free_betas_names: list[str]

Returns the names of the parameters that must be estimated

Returns:

list of names of the parameters

Return type:

list(str)

classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, **kwargs)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id: str :param config_id: identifier of the configuration

Parameters:
  • multiple_expression (Expression) – multiple expression containing the catalog.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str | None) – these notes will be included in the report file.

  • parameters (str | Parameters | None) – object with the parameters

  • config_id (str)

Return type:

BIOGEME

classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, **kwargs)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:
  • config_id (str) – identifier of the configuration

  • central_controller (CentralController) – central controller for the multiple expression containing all the catalogs.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str | None) – these notes will be included in the report file.

  • parameters (str | Parameters | None) – object with the parameters

Return type:

BIOGEME

property function_evaluator: CompiledFormulaEvaluator
property function_parameters: dict[str, bool | int | float | str]

Prepare the parameters for the function

property generate_html
property generate_pickle: bool
property generate_yaml
property identification_threshold
property infeasible_cg
init_loglikelihood

Init value of the likelihood function

property initial_radius
is_model_complex()[source]

Check if the model is potentially complex to estimate

Return type:

bool

property large_data_set
property largest_neighborhood
property log_like: Expression | None
log_like_name: str

Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

property loglike: Expression

For backward compatibility

property max_iterations
property max_number_parameters_to_report
property maximum_attempts
property maximum_number_catalog_expressions
property maximum_number_parameters
property missing_data
property modelName: str
property model_elements: ModelElements | None
model_name

Name of the model. Default: ‘biogemeModelDefaultName’

null_loglikelihood

Log likelihood of the null model

property number_of_draws
property number_of_jobs
property number_of_neighbors
property number_of_observations
property number_of_threads
number_unknown_parameters()[source]

Returns the number of parameters that must be estimated

Returns:

number of parameters

Return type:

int

property numerically_safe
property only_robust_stats
property optimization_algorithm
property optimization_parameters: dict[str, bool | int | float | str]
quick_estimate()[source]
Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
Returns:

object containing the estimation results.

Return type:

bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Return type:

EstimationResults

report_array(array, with_names=True)[source]

Reports the entries of the array up to the maximum number

Parameters:
  • array (numpy.array) – array to report

  • with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

retrieve_saved_estimates()[source]

Attempt to retrieve previously saved estimation results from a YAML file.

Return type:

EstimationResults | None

Returns:

An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.

Raises:

BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.

property sample_size
property save_iterations
property save_validation_results
property second_derivatives
property seed
set_random_init_values(default_bound=100.0)[source]

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:

default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.

Return type:

None

simulate(the_beta_values)[source]

Evaluate all simulation formulas on each row of the database using the specified parameter values.

Parameters:

the_beta_values (dict[str, float] | None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.

Return type:

DataFrame

Returns:

A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.

Raises:

BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.

property steptol
property tolerance
property use_jit
user_notes

User notes

validate(estimation_results, slices, groups=None)[source]

Perform out-of-sample validation of the model.

The validation procedure operates by dividing the dataset into a number of slices. For each slice:

  • The slice is used as the validation set.

  • The remaining data forms the estimation set.

  • The model is re-estimated on the estimation set.

  • The model is applied to the validation set to compute the log likelihood.

Parameters:
  • estimation_results (EstimationResults) – Estimation results obtained from the full dataset.

  • slices (int) – Number of data splits to create for cross-validation.

  • groups (str | None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.

Return type:

list[ValidationResult]

Returns:

List of validation results, one for each data slice.

Raises:

BiogemeError – If the dataset is structured as panel data and incompatible with validation.

property version
property weight: Expression | None
weight_name

Keyword used for the name of the weight formula. Default: ‘weight’