biogeme.biogeme module¶
The core routines of Biogeme.
Implementation of the main Biogeme class
- author:
Michel Bierlaire
- date:
Tue Mar 26 16:45:15 2019
It combines the database and the model specification.
- class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶
Bases:
object- Main class that combines the database and the model
specification.
It works in two modes: estimation and simulation.
The following attributes are imported from the parameter file.
- Parameters:
database (Database)
formulas (Expression | dict[str, Expression])
random_number_generators (dict[str, RandomNumberGeneratorTuple] | None)
user_notes (str | None)
parameters (str | Parameters | None)
group_of_parameters (dict[str, list[str]] | None)
- property algo_parameters: dict[str, bool | int | float | str]¶
Prepare the parameters for the optimization algorithm.
- property bayesian_draws¶
- bayesian_estimation(starting_values=None)[source]¶
- Return type:
- Parameters:
starting_values (dict[str, float] | None)
- bayesian_estimation_non_panel(starting_values)[source]¶
- Return type:
- Parameters:
starting_values (dict[str, float])
- bayesian_estimation_or_load_summary(yaml_file_name=None, starting_values=None)[source]¶
Load a Bayesian summary from YAML, or estimate and build one.
If
yaml_file_nameis provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, the summary is read from it and returned directly. If the file does not exist, Bayesian estimation is performed usingbayesian_estimation(), and the resulting full Bayesian results are converted to aBayesianResultsSummary.- Parameters:
yaml_file_name (
str|None) – YAML summary file to load. If None, the default filename generated from the model name is used.starting_values (
dict[str,float] |None) – Optional starting values forwarded to Bayesian estimation when estimation is required.
- Return type:
- Returns:
Bayesian summary results.
- bayesian_estimation_panel(starting_values)[source]¶
- Return type:
- Parameters:
starting_values (dict[str, float])
- best_iteration¶
Store the best iteration found so far.
- property bootstrap_samples¶
- calculate_init_likelihood()[source]¶
Calculate the value of the log likelihood function
The default values of the parameters are used.
- Returns:
value of the log likelihood.
- Return type:
float
- property calculate_likelihood¶
- property calculate_loo¶
- calculate_null_loglikelihood(avail)[source]¶
Calculate the log likelihood of the null model that predicts equal probability for each alternative
- Parameters:
avail (
dict[int,Expression|float|int|bool]) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.- Returns:
value of the log likelihood
- Return type:
float
- property calculate_waic¶
- property calculating_second_derivatives¶
- property chains¶
- change_init_values(betas)[source]¶
Modifies the initial values of the parameters in all formula
- Parameters:
betas (
dict[str,float]) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.- Return type:
None
- check_derivatives(verbose=False)[source]¶
Verifies the implementation of the derivatives.
It compares the analytical version with the finite differences approximation.
- Parameters:
verbose (
bool) – if True, the comparisons are reported. Default: False.- Return type:
- Returns:
f, g, h, gdiff, hdiff where
f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,
- confidence_intervals(beta_values, interval_size=0.9)[source]¶
Calculate confidence intervals on the simulated quantities
- Parameters:
beta_values (
list[dict[str,float]]) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.interval_size (
float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].
- Returns:
two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value
Example:
# Read the estimation results from a file results = EstimationEResults.from_yaml_file(filename='my_model.yaml') # Retrieve the names of the betas parameters that have been # estimated betas = biogeme.freeBetaNames # Draw 100 realization of the distribution of the estimators b = results.getBetasForSensitivityAnalysis(betas, size=100) # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(beta_values) # Calculate the confidence intervals for each formula left, right = biogeme.confidenceIntervals(b, 0.9)
- Return type:
tuple[DataFrame,DataFrame]
- property dogleg¶
- property enlarging_factor¶
- estimate(starting_values=None, run_bootstrap=False)[source]¶
Estimate the parameters of the model(s).
- Returns:
object containing the estimation results.
- Return type:
- Parameters:
starting_values (dict[str, float] | None)
run_bootstrap (bool)
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.estimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Parameters:
starting_values (dict[str, float] | None)
run_bootstrap (bool)
- Return type:
- estimate_catalog(selected_configurations=None, quick_estimate=False, run_bootstrap=False)[source]¶
Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.
- Parameters:
selected_configurations (
set[Configuration]) – set of configurations. If None, all configurations are considered.quick_estimate (
bool) – if True, the final statistics are not calculated.run_bootstrap (
bool) – if True, bootstrapping is applied.
- Return type:
dict[str,EstimationResults]- Returns:
object containing the estimation results associated with the name of each specification, as well as a description of each configuration
- estimate_or_load(yaml_file_name=None, starting_values=None, run_bootstrap=False)[source]¶
Load estimation results from YAML, or estimate the model.
If
yaml_file_nameis provided, that file is considered. Otherwise, the default YAML filename is generated from the model name. If the file exists, estimation results are loaded from it and no estimation is performed. If the file does not exist, the model is estimated.- Parameters:
yaml_file_name (
str|None) – YAML file to load. If None, the default filename generated from the model name is used.starting_values (
dict[str,float] |None) – Optional starting values forwarded to the estimation procedure when estimation is required.run_bootstrap (
bool) – If True, bootstrapping is performed when estimation is required.
- Return type:
- Returns:
Estimation results, either loaded from YAML or freshly estimated.
- property expressions_registry¶
- property free_betas_names: list[str]¶
Returns the names of the parameters that must be estimated
- Returns:
list of names of the parameters
- Return type:
list(str)
- classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶
Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id:
str:param config_id: identifier of the configuration- Parameters:
multiple_expression (
Expression) – multiple expression containing the catalog.database (
Database) – database to be passed to the Biogeme objectuser_notes (
str|None) – these notes will be included in the report file.parameters (
str|Parameters|None) – object with the parametersconfig_id (str)
group_of_parameters (dict[str, list[str]] | None)
- Return type:
- classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, group_of_parameters=None, **kwargs)[source]¶
Obtain the Biogeme object corresponding to the configuration of a multiple expression
- Parameters:
config_id (
str) – identifier of the configurationcentral_controller (
CentralController) – central controller for the multiple expression containing all the catalogs.database (
Database) – database to be passed to the Biogeme objectuser_notes (
str|None) – these notes will be included in the report file.parameters (
str|Parameters|None) – object with the parametersgroup_of_parameters (dict[str, list[str]] | None)
- Return type:
- property function_evaluator: CompiledFormulaEvaluator¶
- property function_parameters: dict[str, bool | int | float | str]¶
Prepare the parameters for the function
- property generate_html¶
- property generate_netcdf¶
- property generate_pickle: bool¶
- property generate_yaml¶
- group_of_parameters¶
Optional grouping of parameters used when generating reports.
- property identification_threshold¶
- property infeasible_cg¶
- init_loglikelihood¶
Init value of the likelihood function
- property initial_radius¶
- property large_data_set¶
- property largest_neighborhood¶
- property log_like: Expression | None¶
- log_like_name: str¶
Keywords used for the name of the loglikelihood formula. Default: ‘log_like’
- property loglike: Expression¶
For backward compatibility
- property max_iterations¶
- property max_number_parameters_to_report¶
- property maximum_attempts¶
- property maximum_number_catalog_expressions¶
- property maximum_number_parameters¶
- property mcmc_sampling_strategy¶
- property missing_data¶
- property modelName: str¶
- property model_elements: ModelElements | None¶
- model_name¶
Name of the model. Default: ‘biogemeModelDefaultName’
- null_loglikelihood¶
Log likelihood of the null model
- property number_of_draws¶
- property number_of_jobs¶
- property number_of_neighbors¶
- property number_of_observations¶
- property number_of_threads¶
- number_unknown_parameters()[source]¶
Returns the number of parameters that must be estimated
- Returns:
number of parameters
- Return type:
int
- property numerically_safe¶
- property only_robust_stats¶
- property optimization_algorithm¶
- property optimization_parameters: dict[str, bool | int | float | str]¶
- quick_estimate()[source]¶
- Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
- Returns:
object containing the estimation results.
- Return type:
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.quickEstimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Return type:
- report_array(array, with_names=True)[source]¶
Reports the entries of the array up to the maximum number
- Parameters:
array (
ndarray) – array to reportwith_names (
bool) – if True, the names of the parameters are included
- Returns:
string reporting the values
- Return type:
str
- retrieve_saved_estimates()[source]¶
Attempt to retrieve previously saved estimation results from a YAML file.
- Return type:
EstimationResults|None- Returns:
An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.
- Raises:
BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.
- property sample_from_prior¶
- property sample_size¶
- property save_iterations¶
- property save_validation_results¶
- property second_derivatives¶
- property seed¶
- set_random_init_values(default_bound=100.0)[source]¶
Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.
- Parameters:
default_bound (
float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.- Return type:
None
- simulate(the_beta_values)[source]¶
Evaluate all simulation formulas on each row of the database using the specified parameter values.
- Parameters:
the_beta_values (
dict[str,float] |None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.- Return type:
DataFrame- Returns:
A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.
- Raises:
BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.
- simulate_bayesian(bayesian_estimation_results, lower_quantile=0.025, upper_quantile=0.975, percentage_of_draws_to_use=10.0)[source]¶
Simulate all formulas in self.formulas over posterior draws and summarize them per observation.
For each observation and each simulation formula, this returns the mean, lower_quantile and upper_quantile across the selected posterior draws.
- Return type:
DataFrame- Parameters:
bayesian_estimation_results (BayesianResults)
lower_quantile (float)
upper_quantile (float)
percentage_of_draws_to_use (float)
- property steptol¶
- property target_accept¶
- property tolerance¶
- property use_flatten_database: bool¶
- property use_jit¶
- user_notes¶
User notes
- validate(estimation_results, slices, groups=None)[source]¶
Perform out-of-sample validation of the model.
The validation procedure operates by dividing the dataset into a number of slices. For each slice:
The slice is used as the validation set.
The remaining data forms the estimation set.
The model is re-estimated on the estimation set.
The model is applied to the validation set to compute the log likelihood.
- Parameters:
estimation_results (
EstimationResults) – Estimation results obtained from the full dataset.slices (
int) – Number of data splits to create for cross-validation.groups (
str|None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.
- Return type:
list[ValidationResult]- Returns:
List of validation results, one for each data slice.
- Raises:
BiogemeError – If the dataset is structured as panel data and incompatible with validation.
- property version¶
- property warmup¶
- property weight: Expression | None¶
- weight_name¶
Keyword used for the name of the weight formula. Default: ‘weight’