biogeme.biogeme module¶

The core routines of Biogeme.

Implementation of the main Biogeme class

author:: Michel Bierlaire
date:: Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, **kwargs)[source]¶

Bases: object

Main class that combines the database and the model: specification.

It works in two modes: estimation and simulation.

The following attributes are imported from the parameter file.

Parameters:

database (Database)
formulas (Expression | dict[str, Expression])
random_number_generators (dict[str, RandomNumberGeneratorTuple] | None)
user_notes (str | None)
parameters (str | Parameters | None)

property algo_parameters: dict[str, bool | int | float | str]¶: Prepare the parameters for the optimization algorithm.

bayesian_estimation(starting_values=None)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float] | None)

bayesian_estimation_non_panel(starting_values)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float])

bayesian_estimation_panel(starting_values)[source]¶

Return type:: BayesianResults
Parameters:: starting_values (dict[str, float])

best_iteration¶: Store the best iteration found so far.

bootstrap_samples: int¶

calculate_init_likelihood()[source]¶

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:: value of the log likelihood.
Return type:: float.

calculate_null_loglikelihood(avail)[source]¶

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:: avail (list of biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.
Returns:: value of the log likelihood
Return type:: float

change_init_values(betas)[source]¶

Modifies the initial values of the parameters in all formula

Parameters:: betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.
Return type:: None

check_derivatives(verbose=False)[source]¶

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:

verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

CheckDerivativesResults

Returns:

f, g, h, gdiff, hdiff where

f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,

confidence_intervals(beta_values, interval_size=0.9)[source]¶

Calculate confidence intervals on the simulated quantities

Parameters:

beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.
interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = EstimationEResults.from_yaml_file(filename='my_model.yaml')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size=100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple of two Pandas dataframes.

contains_log_likelihood()[source]¶

Return type:: bool

dogleg: bool¶

enlarging_factor: float¶

estimate(starting_values=None, recycle=False, run_bootstrap=False, **kwargs)[source]¶

Estimate the parameters of the model(s).

Returns:

object containing the estimation results.

Return type:

biogeme.bioResults

Parameters:

starting_values (dict[str, float] | None)
recycle (bool)
run_bootstrap (bool)

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()

Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:

starting_values (dict[str, float] | None)
recycle (bool)
run_bootstrap (bool)

Return type:

EstimationResults

estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]¶

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:

selected_configurations (set[Configuration]) – set of configurations. If None, all configurations are considered.
quick_estimate (bool) – if True, the final statistics are not calculated.
recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
run_bootstrap (bool) – if True, bootstrapping is applied.

Return type:

dict[str, EstimationResults]

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

property expressions_registry¶

property free_betas_names: list[str]¶

Returns the names of the parameters that must be estimated

Returns:: list of names of the parameters
Return type:: list(str)

classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, **kwargs)[source]¶

Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id: str :param config_id: identifier of the configuration

Parameters:

multiple_expression (Expression) – multiple expression containing the catalog.
database (Database) – database to be passed to the Biogeme object
user_notes (str | None) – these notes will be included in the report file.
parameters (str | Parameters | None) – object with the parameters
config_id (str)

Return type:

BIOGEME

classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, **kwargs)[source]¶

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:

config_id (str) – identifier of the configuration
central_controller (CentralController) – central controller for the multiple expression containing all the catalogs.
database (Database) – database to be passed to the Biogeme object
user_notes (str | None) – these notes will be included in the report file.
parameters (str | Parameters | None) – object with the parameters

Return type:

BIOGEME

property function_evaluator: CompiledFormulaEvaluator¶

property function_parameters: dict[str, bool | int | float | str]¶: Prepare the parameters for the function

generate_html: bool¶

generate_netcdf: bool¶

property generate_pickle: bool¶

generate_yaml: bool¶

infeasible_cg: bool¶

init_loglikelihood¶: Init value of the likelihood function

initial_radius: float¶

is_model_complex()[source]¶

Check if the model is potentially complex to estimate

Return type:: bool

property log_like: Expression | None¶

log_like_name: str¶: Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

property loglike: Expression¶: For backward compatibility

max_iterations: int¶

max_number_parameters_to_report: int¶

maximum_number_catalog_expressions: int¶

property modelName: str¶

property model_elements: ModelElements | None¶

model_name¶: Name of the model. Default: ‘biogemeModelDefaultName’

null_loglikelihood¶: Log likelihood of the null model

property number_of_observations¶

number_of_threads: int¶

number_unknown_parameters()[source]¶

Returns the number of parameters that must be estimated

Returns:: number of parameters
Return type:: int

optimization_algorithm: str¶

property optimization_parameters: dict[str, bool | int | float | str]¶

quick_estimate()[source]¶

Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)

Returns:: object containing the estimation results.
Return type:: bioResults

Example:

# Create an instance of biogeme
biogeme = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()

Raises:: BiogemeError – if no expression has been provided for the likelihood
Return type:: EstimationResults

report_array(array, with_names=True)[source]¶

Reports the entries of the array up to the maximum number

Parameters:

array (numpy.array) – array to report
with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

retrieve_saved_estimates()[source]¶

Attempt to retrieve previously saved estimation results from a YAML file.

Return type:: EstimationResults | None
Returns:: An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.
Raises:: BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.

property sample_size¶

save_iterations: bool¶

second_derivatives: int¶

seed: int¶

set_random_init_values(default_bound=100.0)[source]¶

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:: default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.
Return type:: None

simulate(the_beta_values)[source]¶

Evaluate all simulation formulas on each row of the database using the specified parameter values.

Parameters:: the_beta_values (dict[str, float] | None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.
Return type:: DataFrame
Returns:: A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.
Raises:: BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.

simulate_bayesian(bayesian_estimation_results, lower_quantile=0.025, upper_quantile=0.975, percentage_of_draws_to_use=10.0)[source]¶

Simulate all formulas in self.formulas over posterior draws and summarize them per observation.

For each observation and each simulation formula, this returns the mean, lower_quantile and upper_quantile across the selected posterior draws.

Return type:

DataFrame

Parameters:

bayesian_estimation_results (BayesianResults)
lower_quantile (float)
upper_quantile (float)
percentage_of_draws_to_use (float)

steptol: float¶

tolerance: float¶

property use_flatten_database: bool¶

user_notes¶: User notes

validate(estimation_results, slices, groups=None)[source]¶

Perform out-of-sample validation of the model.

The validation procedure operates by dividing the dataset into a number of slices. For each slice:

The slice is used as the validation set.

The remaining data forms the estimation set.

The model is re-estimated on the estimation set.

The model is applied to the validation set to compute the log likelihood.

Parameters:

estimation_results (EstimationResults) – Estimation results obtained from the full dataset.
slices (int) – Number of data splits to create for cross-validation.
groups (str | None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.

Return type:

list[ValidationResult]

Returns:

List of validation results, one for each data slice.

Raises:

BiogemeError – If the dataset is structured as panel data and incompatible with validation.

property weight: Expression | None¶

weight_name¶: Keyword used for the name of the weight formula. Default: ‘weight’