biogeme.biogeme module¶
The core routines of Biogeme.
Implementation of the main Biogeme class
- author:
Michel Bierlaire
- date:
Tue Mar 26 16:45:15 2019
It combines the database and the model specification.
- class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, **kwargs)[source]¶
Bases:
object
- Main class that combines the database and the model
specification.
It works in two modes: estimation and simulation.
The following attributes are imported from the parameter file.
- Parameters:
database (Database)
formulas (Expression | dict[str, Expression])
random_number_generators (dict[str:RandomNumberGeneratorTuple] | None)
user_notes (str | None)
parameters (str | Parameters | None)
- property algo_parameters: dict[str, bool | int | float | str]¶
Prepare the parameters for the optimization algorithm.
- best_iteration¶
Store the best iteration found so far.
- property bootstrap_samples¶
- calculate_init_likelihood()[source]¶
Calculate the value of the log likelihood function
The default values of the parameters are used.
- Returns:
value of the log likelihood.
- Return type:
float.
- calculate_null_loglikelihood(avail)[source]¶
Calculate the log likelihood of the null model that predicts equal probability for each alternative
- Parameters:
avail (list of
biogeme.expressions.Expression
) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.- Returns:
value of the log likelihood
- Return type:
float
- property calculating_second_derivatives¶
- change_init_values(betas)[source]¶
Modifies the initial values of the parameters in all formula
- Parameters:
betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.
- Return type:
None
- check_derivatives(verbose=False)[source]¶
Verifies the implementation of the derivatives.
It compares the analytical version with the finite differences approximation.
- Parameters:
verbose (
bool
) – if True, the comparisons are reported. Default: False.- Return type:
- Returns:
f, g, h, gdiff, hdiff where
f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,
- confidence_intervals(beta_values, interval_size=0.9)[source]¶
Calculate confidence intervals on the simulated quantities
- Parameters:
beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.
interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].
- Returns:
two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value
Example:
# Read the estimation results from a file results = EstimationEResults.from_yaml_file(filename = 'my_model.yaml') # Retrieve the names of the betas parameters that have been # estimated betas = biogeme.freeBetaNames # Draw 100 realization of the distribution of the estimators b = results.getBetasForSensitivityAnalysis(betas, size = 100) # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(beta_values) # Calculate the confidence intervals for each formula left, right = biogeme.confidenceIntervals(b, 0.9)
- Return type:
tuple of two Pandas dataframes.
- property dogleg¶
- property enlarging_factor¶
- estimate(starting_values=None, recycle=False, run_bootstrap=False, **kwargs)[source]¶
Estimate the parameters of the model(s).
- Returns:
object containing the estimation results.
- Return type:
biogeme.bioResults
- Parameters:
starting_values (dict[str, float] | None)
recycle (bool)
run_bootstrap (bool)
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.estimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Parameters:
starting_values (dict[str, float] | None)
recycle (bool)
run_bootstrap (bool)
- Return type:
- estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]¶
Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.
- Parameters:
selected_configurations (
set
[Configuration
]) – set of configurations. If None, all configurations are considered.quick_estimate (
bool
) – if True, the final statistics are not calculated.recycle (
bool
) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.run_bootstrap (
bool
) – if True, bootstrapping is applied.
- Return type:
dict
[str
,EstimationResults
]- Returns:
object containing the estimation results associated with the name of each specification, as well as a description of each configuration
- property expressions_registry¶
- property free_betas_names: list[str]¶
Returns the names of the parameters that must be estimated
- Returns:
list of names of the parameters
- Return type:
list(str)
- classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, **kwargs)[source]¶
Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id:
str
:param config_id: identifier of the configuration- Parameters:
multiple_expression (
Expression
) – multiple expression containing the catalog.database (
Database
) – database to be passed to the Biogeme objectuser_notes (
str
|None
) – these notes will be included in the report file.parameters (
str
|Parameters
|None
) – object with the parametersconfig_id (str)
- Return type:
- classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, **kwargs)[source]¶
Obtain the Biogeme object corresponding to the configuration of a multiple expression
- Parameters:
config_id (
str
) – identifier of the configurationcentral_controller (
CentralController
) – central controller for the multiple expression containing all the catalogs.database (
Database
) – database to be passed to the Biogeme objectuser_notes (
str
|None
) – these notes will be included in the report file.parameters (
str
|Parameters
|None
) – object with the parameters
- Return type:
- property function_evaluator: CompiledFormulaEvaluator¶
- property function_parameters: dict[str, bool | int | float | str]¶
Prepare the parameters for the function
- property generate_html¶
- property generate_pickle: bool¶
- property generate_yaml¶
- property identification_threshold¶
- property infeasible_cg¶
- init_loglikelihood¶
Init value of the likelihood function
- property initial_radius¶
- property large_data_set¶
- property largest_neighborhood¶
- property log_like: Expression | None¶
-
log_like_name:
str
¶ Keywords used for the name of the loglikelihood formula. Default: ‘log_like’
- property loglike: Expression¶
For backward compatibility
- property max_iterations¶
- property max_number_parameters_to_report¶
- property maximum_attempts¶
- property maximum_number_catalog_expressions¶
- property maximum_number_parameters¶
- property missing_data¶
- property modelName: str¶
- property model_elements: ModelElements | None¶
- model_name¶
Name of the model. Default: ‘biogemeModelDefaultName’
- null_loglikelihood¶
Log likelihood of the null model
- property number_of_draws¶
- property number_of_jobs¶
- property number_of_neighbors¶
- property number_of_observations¶
- property number_of_threads¶
- number_unknown_parameters()[source]¶
Returns the number of parameters that must be estimated
- Returns:
number of parameters
- Return type:
int
- property numerically_safe¶
- property only_robust_stats¶
- property optimization_algorithm¶
- property optimization_parameters: dict[str, bool | int | float | str]¶
- quick_estimate()[source]¶
- Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
- Returns:
object containing the estimation results.
- Return type:
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.quickEstimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Return type:
- report_array(array, with_names=True)[source]¶
Reports the entries of the array up to the maximum number
- Parameters:
array (numpy.array) – array to report
with_names (bool) – if True, the names of the parameters are included
- Returns:
string reporting the values
- Return type:
str
- retrieve_saved_estimates()[source]¶
Attempt to retrieve previously saved estimation results from a YAML file.
- Return type:
EstimationResults
|None
- Returns:
An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged.
- Raises:
BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead.
- property sample_size¶
- property save_iterations¶
- property save_validation_results¶
- property second_derivatives¶
- property seed¶
- set_random_init_values(default_bound=100.0)[source]¶
Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.
- Parameters:
default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.
- Return type:
None
- simulate(the_beta_values)[source]¶
Evaluate all simulation formulas on each row of the database using the specified parameter values.
- Parameters:
the_beta_values (
dict
[str
,float
] |None
) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.- Return type:
DataFrame
- Returns:
A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula.
- Raises:
BiogemeError – If the_beta_values is None or if the number of parameters is incorrect.
- property steptol¶
- property tolerance¶
- property use_jit¶
- user_notes¶
User notes
- validate(estimation_results, slices, groups=None)[source]¶
Perform out-of-sample validation of the model.
The validation procedure operates by dividing the dataset into a number of slices. For each slice:
The slice is used as the validation set.
The remaining data forms the estimation set.
The model is re-estimated on the estimation set.
The model is applied to the validation set to compute the log likelihood.
- Parameters:
estimation_results (
EstimationResults
) – Estimation results obtained from the full dataset.slices (
int
) – Number of data splits to create for cross-validation.groups (
str
|None
) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.
- Return type:
list
[ValidationResult
]- Returns:
List of validation results, one for each data slice.
- Raises:
BiogemeError – If the dataset is structured as panel data and incompatible with validation.
- property version¶
- property weight: Expression | None¶
- weight_name¶
Keyword used for the name of the weight formula. Default: ‘weight’