biogeme.biogeme module

The core routines of Biogeme.

Implementation of the main Biogeme class

author:

Michel Bierlaire

date:

Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]

Bases: object

Main class that combines the database and the model

specification.

It works in two modes: estimation and simulation.

The following attributes are imported from the parameter file.

Parameters:
property algo_parameters: dict[str, bool | int | float | str]

Prepare the parameters for the optimization algorithm.

property bayesian_draws
bayesian_estimation(starting_values=None)[source]
Return type:

BayesianResults

Parameters:

starting_values (dict[str, float] | None)

bayesian_estimation_non_panel(starting_values)[source]
Return type:

BayesianResults

Parameters:

starting_values (dict[str, float])

bayesian_estimation_or_load_summary(yaml_file_name=None, starting_values=None)[source]

Load a Bayesian summary from YAML, or estimate and build one.

If yaml_file_name is provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, the summary is read from it and returned directly. If the file does not exist, Bayesian estimation is performed using bayesian_estimation(), and the resulting full Bayesian results are converted to a BayesianResultsSummary.

Parameters:
  • yaml_file_name (str | None) – YAML summary file to load. If None, the default filename generated from the model name is used.

  • starting_values (dict[str, float] | None) – Optional starting values forwarded to Bayesian estimation when estimation is required.

Return type:

BayesianResultsSummary

Returns:

Bayesian summary results.

bayesian_estimation_panel(starting_values)[source]
Return type:

BayesianResults

Parameters:

starting_values (dict[str, float])

best_iteration

Store the best iteration found so far.

property bootstrap_samples
calculate_init_likelihood()[source]

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:

value of the log likelihood.

Return type:

float

property calculate_likelihood
property calculate_loo
calculate_null_loglikelihood(avail)[source]

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:

avail (dict[int, Expression | float | int | bool]) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.

Returns:

value of the log likelihood

Return type:

float

property calculate_waic
property calculating_second_derivatives
property chains
change_init_values(betas)[source]

Modifies the initial values of the parameters in all formula

Parameters:

betas (dict[str, float]) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.

Return type:

None

check_derivatives(verbose=False)[source]

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:

verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

CheckDerivativesResults

Returns:

f, g, h, gdiff, hdiff where

  • f is the value of the function,

  • g is the analytical gradient,

  • h is the analytical hessian,

  • gdiff is the difference between the analytical and the finite differences gradient,

  • hdiff is the difference between the analytical and the finite differences hessian,

confidence_intervals(beta_values, interval_size=0.9)[source]

Calculate confidence intervals on the simulated quantities

Parameters:
  • beta_values (list[dict[str, float]]) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.

  • interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = EstimationEResults.from_yaml_file(filename='my_model.yaml')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size=100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple[DataFrame, DataFrame]

contains_log_likelihood()[source]
Return type:

bool

property dogleg
property enlarging_factor
estimate(starting_values=None, run_bootstrap=False)[source]

Estimate the parameters of the model(s).

Returns:

object containing the estimation results.

Return type:

EstimationResults

Parameters:
  • starting_values (dict[str, float] | None)

  • run_bootstrap (bool)

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:
  • starting_values (dict[str, float] | None)

  • run_bootstrap (bool)

Return type:

EstimationResults

estimate_catalog(selected_configurations=None, quick_estimate=False, run_bootstrap=False)[source]

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:
  • selected_configurations (set[Configuration]) – set of configurations. If None, all configurations are considered.

  • quick_estimate (bool) – if True, the final statistics are not calculated.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Return type:

dict[str, EstimationResults]

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

estimate_or_load(yaml_file_name=None, starting_values=None, run_bootstrap=False)[source]

Load estimation results from YAML, or estimate the model.

If yaml_file_name is provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, estimation results are loaded from it and no estimation is performed. If the file does not exist, the model is estimated.

Parameters:
  • yaml_file_name (str | None) – YAML file to load. If None, the default filename generated from the model name is used.

  • starting_values (dict[str, float] | None) – Optional starting values forwarded to the estimation procedure when estimation is required.

  • run_bootstrap (bool) – If True, bootstrapping is performed when estimation is required.

Return type:

EstimationResults

Returns:

Estimation results, either loaded from YAML or freshly estimated.

property expressions_registry
property free_betas_names: list[str]

Returns the names of the parameters that must be estimated

Returns:

list of names of the parameters

Return type:

list(str)

classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id: str :param config_id: identifier of the configuration

Parameters:
  • multiple_expression (Expression) – multiple expression containing the catalog.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str | None) – these notes will be included in the report file.

  • parameters (str | Parameters | None) – object with the parameters

  • config_id (str)

  • group_of_parameters (dict[str, list[str]] | None)

Return type:

BIOGEME

classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:
  • config_id (str) – identifier of the configuration

  • central_controller (CentralController) – central controller for the multiple expression containing all the catalogs.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str | None) – these notes will be included in the report file.

  • parameters (str | Parameters | None) – object with the parameters

  • group_of_parameters (dict[str, list[str]] | None)

Return type:

BIOGEME

property function_evaluator: CompiledFormulaEvaluator
property function_parameters: dict[str, bool | int | float | str]

Prepare the parameters for the function

property generate_html
property generate_netcdf
property generate_pickle: bool
property generate_yaml
group_of_parameters

Optional grouping of parameters used when generating reports.

property identification_threshold
property infeasible_cg
init_loglikelihood

Init value of the likelihood function

property initial_radius
is_model_complex()[source]

Check if the model is potentially complex to estimate

Return type:

bool

property large_data_set
property largest_neighborhood
property log_like: Expression | None
log_like_name: str

Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

property loglike: Expression

For backward compatibility

property max_iterations
property max_number_parameters_to_report
property maximum_attempts
property maximum_number_catalog_expressions
property maximum_number_parameters
property mcmc_sampling_strategy
property missing_data
property modelName: str
property model_elements: ModelElements | None
model_name

Name of the model. Default: ‘biogemeModelDefaultName’

null_loglikelihood

Log likelihood of the null model

property number_of_draws
property number_of_jobs
property number_of_neighbors
property number_of_observations
property number_of_threads
number_unknown_parameters()[source]

Returns the number of parameters that must be estimated

Returns:

number of parameters

Return type:

int

property numerically_safe
property only_robust_stats
property optimization_algorithm
property optimization_parameters: dict[str, bool | int | float | str]
quick_estimate()[source]
Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
Returns:

object containing the estimation results.

Return type:

EstimationResults

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Return type:

EstimationResults

report_array(array, with_names=True)[source]

Reports the entries of the array up to the maximum number

Parameters:
  • array (ndarray) – array to report

  • with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

retrieve_saved_estimates()[source]

Attempt to retrieve previously saved estimation results from a YAML file.

Return type:

EstimationResults | None

Returns:

An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.

Raises:

BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.

property sample_from_prior
property sample_size
property save_iterations
property save_validation_results
property second_derivatives
property seed
set_random_init_values(default_bound=100.0)[source]

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:

default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.

Return type:

None

simulate(the_beta_values)[source]

Evaluate all simulation formulas on each row of the database using the specified parameter values.

Parameters:

the_beta_values (dict[str, float] | None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.

Return type:

DataFrame

Returns:

A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.

Raises:

BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.

simulate_bayesian(bayesian_estimation_results, lower_quantile=0.025, upper_quantile=0.975, percentage_of_draws_to_use=10.0)[source]

Simulate all formulas in self.formulas over posterior draws and summarize them per observation.

For each observation and each simulation formula, this returns the mean, lower_quantile and upper_quantile across the selected posterior draws.

Return type:

DataFrame

Parameters:
  • bayesian_estimation_results (BayesianResults)

  • lower_quantile (float)

  • upper_quantile (float)

  • percentage_of_draws_to_use (float)

property steptol
property target_accept
property tolerance
property use_flatten_database: bool
property use_jit
user_notes

User notes

validate(estimation_results, slices, groups=None)[source]

Perform out-of-sample validation of the model.

The validation procedure operates by dividing the dataset into a number of slices. For each slice:

  • The slice is used as the validation set.

  • The remaining data forms the estimation set.

  • The model is re-estimated on the estimation set.

  • The model is applied to the validation set to compute the log likelihood.

Parameters:
  • estimation_results (EstimationResults) – Estimation results obtained from the full dataset.

  • slices (int) – Number of data splits to create for cross-validation.

  • groups (str | None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.

Return type:

list[ValidationResult]

Returns:

List of validation results, one for each data slice.

Raises:

BiogemeError – If the dataset is structured as panel data and incompatible with validation.

property version
property warmup
property weight: Expression | None
weight_name

Keyword used for the name of the weight formula. Default: ‘weight’