biogeme.biogeme module¶

The core routines of Biogeme.

Implementation of the main Biogeme class

author:: Michel Bierlaire
date:: Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶

Bases: object

Main class that combines the database and the model: specification.

It works in two modes: estimation and simulation.

The following attributes are imported from the parameter file.

Parameters:

database (Database)
formulas (Expression | dict[str, Expression])
random_number_generators (dict[str, RandomNumberGeneratorTuple] | None)
user_notes (str | None)
parameters (str | Parameters | None)
group_of_parameters (dict[str, list[str]] | None)

property algo_parameters: dict[str, bool | int | float | str]¶: Prepare the parameters for the optimization algorithm.

property bayesian_draws¶

bayesian_estimation(starting_values=None)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float] | None)

bayesian_estimation_non_panel(starting_values)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float])

bayesian_estimation_or_load_summary(yaml_file_name=None, starting_values=None)[source]¶

Load a Bayesian summary from YAML, or estimate and build one.

If yaml_file_name is provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, the summary is read from it and returned directly. If the file does not exist, Bayesian estimation is performed using bayesian_estimation(), and the resulting full Bayesian results are converted to a BayesianResultsSummary.

Parameters:

yaml_file_name (str | None) – YAML summary file to load. If None, the default filename generated from the model name is used.
starting_values (dict[str, float] | None) – Optional starting values forwarded to Bayesian estimation when estimation is required.

Return type:

BayesianResultsSummary

Returns:

Bayesian summary results.

bayesian_estimation_panel(starting_values)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float])

best_iteration¶: Store the best iteration found so far.

property bootstrap_samples¶

calculate_init_likelihood()[source]¶

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:: value of the log likelihood.
Return type:: float

property calculate_likelihood¶

property calculate_loo¶

calculate_null_loglikelihood(avail)[source]¶

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:: avail (dict[int, Expression | float | int | bool]) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.
Returns:: value of the log likelihood
Return type:: float

property calculate_waic¶

property calculating_second_derivatives¶

property chains¶

change_init_values(betas)[source]¶

Modifies the initial values of the parameters in all formula

Parameters:: betas (dict[str, float]) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.
Return type:: None

check_derivatives(verbose=False)[source]¶

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:

verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

CheckDerivativesResults

Returns:

f, g, h, gdiff, hdiff where

f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,

confidence_intervals(beta_values, interval_size=0.9)[source]¶

Calculate confidence intervals on the simulated quantities

Parameters:

beta_values (list[dict[str, float]]) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.
interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = EstimationEResults.from_yaml_file(filename='my_model.yaml')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size=100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple[DataFrame, DataFrame]

contains_log_likelihood()[source]¶

Return type:: bool

property dogleg¶

property enlarging_factor¶

estimate(starting_values=None, run_bootstrap=False)[source]¶

Estimate the parameters of the model(s).

Returns:

object containing the estimation results.

Return type:

EstimationResults

Parameters:

starting_values (dict[str, float] | None)
run_bootstrap (bool)

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()

Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:

starting_values (dict[str, float] | None)
run_bootstrap (bool)

Return type:

EstimationResults

estimate_catalog(selected_configurations=None, quick_estimate=False, run_bootstrap=False)[source]¶

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:

selected_configurations (set[Configuration]) – set of configurations. If None, all configurations are considered.
quick_estimate (bool) – if True, the final statistics are not calculated.
run_bootstrap (bool) – if True, bootstrapping is applied.

Return type:

dict[str, EstimationResults]

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

estimate_or_load(yaml_file_name=None, starting_values=None, run_bootstrap=False)[source]¶

Load estimation results from YAML, or estimate the model.

If yaml_file_name is provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, estimation results are loaded from it and no estimation is performed. If the file does not exist, the model is estimated.

Parameters:

yaml_file_name (str | None) – YAML file to load. If None, the default filename generated from the model name is used.
starting_values (dict[str, float] | None) – Optional starting values forwarded to the estimation procedure when estimation is required.
run_bootstrap (bool) – If True, bootstrapping is performed when estimation is required.

Return type:

EstimationResults

Returns:

Estimation results, either loaded from YAML or freshly estimated.

property expressions_registry¶

property free_betas_names: list[str]¶

Returns the names of the parameters that must be estimated

Returns:: list of names of the parameters
Return type:: list(str)

classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶

Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id: str :param config_id: identifier of the configuration

Parameters:

multiple_expression (Expression) – multiple expression containing the catalog.
database (Database) – database to be passed to the Biogeme object
user_notes (str | None) – these notes will be included in the report file.
parameters (str | Parameters | None) – object with the parameters
config_id (str)
group_of_parameters (dict[str, list[str]] | None)

Return type:

BIOGEME

classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:

config_id (str) – identifier of the configuration
central_controller (CentralController) – central controller for the multiple expression containing all the catalogs.
database (Database) – database to be passed to the Biogeme object
user_notes (str | None) – these notes will be included in the report file.
parameters (str | Parameters | None) – object with the parameters
group_of_parameters (dict[str, list[str]] | None)

Return type:

BIOGEME

property function_evaluator: CompiledFormulaEvaluator¶

property function_parameters: dict[str, bool | int | float | str]¶: Prepare the parameters for the function

property generate_html¶

property generate_netcdf¶

property generate_pickle: bool¶

property generate_yaml¶

group_of_parameters¶: Optional grouping of parameters used when generating reports.

property identification_threshold¶

property infeasible_cg¶

init_loglikelihood¶: Init value of the likelihood function

property initial_radius¶

is_model_complex()[source]¶

Check if the model is potentially complex to estimate

Return type:: bool

property large_data_set¶

property largest_neighborhood¶

property log_like: Expression | None¶

log_like_name: str¶: Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

property loglike: Expression¶: For backward compatibility

property max_iterations¶

property max_number_parameters_to_report¶

property maximum_attempts¶

property maximum_number_catalog_expressions¶

property maximum_number_parameters¶

property mcmc_sampling_strategy¶

property missing_data¶

property modelName: str¶

property model_elements: ModelElements | None¶

model_name¶: Name of the model. Default: ‘biogemeModelDefaultName’

null_loglikelihood¶: Log likelihood of the null model

property number_of_draws¶

property number_of_jobs¶

property number_of_neighbors¶

property number_of_observations¶

property number_of_threads¶

number_unknown_parameters()[source]¶

Returns the number of parameters that must be estimated

Returns:: number of parameters
Return type:: int

property numerically_safe¶

property only_robust_stats¶

property optimization_algorithm¶

property optimization_parameters: dict[str, bool | int | float | str]¶

quick_estimate()[source]¶

Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)

Returns:: object containing the estimation results.
Return type:: EstimationResults

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()

Raises:: BiogemeError – if no expression has been provided for the likelihood
Return type:: EstimationResults

report_array(array, with_names=True)[source]¶

Reports the entries of the array up to the maximum number

Parameters:

array (ndarray) – array to report
with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

retrieve_saved_estimates()[source]¶

Attempt to retrieve previously saved estimation results from a YAML file.

Return type:: EstimationResults | None
Returns:: An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.
Raises:: BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.

property sample_from_prior¶

property sample_size¶

property save_iterations¶

property save_validation_results¶

property second_derivatives¶

property seed¶

set_random_init_values(default_bound=100.0)[source]¶

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:: default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.
Return type:: None

simulate(the_beta_values)[source]¶

Evaluate all simulation formulas on each row of the database using the specified parameter values.

Parameters:: the_beta_values (dict[str, float] | None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.
Return type:: DataFrame
Returns:: A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.
Raises:: BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.

simulate_bayesian(bayesian_estimation_results, lower_quantile=0.025, upper_quantile=0.975, percentage_of_draws_to_use=10.0)[source]¶

Simulate all formulas in self.formulas over posterior draws and summarize them per observation.

For each observation and each simulation formula, this returns the mean, lower_quantile and upper_quantile across the selected posterior draws.

Return type:

DataFrame

Parameters:

bayesian_estimation_results (BayesianResults)
lower_quantile (float)
upper_quantile (float)
percentage_of_draws_to_use (float)

property steptol¶

property target_accept¶

property tolerance¶

property use_flatten_database: bool¶

property use_jit¶

user_notes¶: User notes

validate(estimation_results, slices, groups=None)[source]¶

Perform out-of-sample validation of the model.

The validation procedure operates by dividing the dataset into a number of slices. For each slice:

The slice is used as the validation set.

The remaining data forms the estimation set.

The model is re-estimated on the estimation set.

The model is applied to the validation set to compute the log likelihood.

Parameters:

estimation_results (EstimationResults) – Estimation results obtained from the full dataset.
slices (int) – Number of data splits to create for cross-validation.
groups (str | None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.

Return type:

list[ValidationResult]

Returns:

List of validation results, one for each data slice.

Raises:

BiogemeError – If the dataset is structured as panel data and incompatible with validation.

property version¶

property warmup¶

property weight: Expression | None¶

weight_name¶: Keyword used for the name of the weight formula. Default: ‘weight’