biogeme.biogeme module¶
The core routines of Biogeme.
Implementation of the main Biogeme class
- author:
- Michel Bierlaire 
- date:
- Tue Mar 26 16:45:15 2019 
It combines the database and the model specification.
- class biogeme.biogeme.BIOGEME(database, formulas, random_number_generators=None, user_notes=None, parameters=None, **kwargs)[source]¶
- Bases: - object- Main class that combines the database and the model
- specification. 
 - It works in two modes: estimation and simulation. - The following attributes are imported from the parameter file. - Parameters:
- database (Database) 
- formulas (Expression | dict[str, Expression]) 
- random_number_generators (dict[str:RandomNumberGeneratorTuple] | None) 
- user_notes (str | None) 
- parameters (str | Parameters | None) 
 
 - property algo_parameters: dict[str, bool | int | float | str]¶
- Prepare the parameters for the optimization algorithm. 
 - best_iteration¶
- Store the best iteration found so far. 
 - property bootstrap_samples¶
 - calculate_init_likelihood()[source]¶
- Calculate the value of the log likelihood function - The default values of the parameters are used. - Returns:
- value of the log likelihood. 
- Return type:
- float. 
 
 - calculate_null_loglikelihood(avail)[source]¶
- Calculate the log likelihood of the null model that predicts equal probability for each alternative - Parameters:
- avail (list of - biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.
- Returns:
- value of the log likelihood 
- Return type:
- float 
 
 - property calculating_second_derivatives¶
 - change_init_values(betas)[source]¶
- Modifies the initial values of the parameters in all formula - Parameters:
- betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters. 
- Return type:
- None
 
 - check_derivatives(verbose=False)[source]¶
- Verifies the implementation of the derivatives. - It compares the analytical version with the finite differences approximation. - Parameters:
- verbose ( - bool) – if True, the comparisons are reported. Default: False.
- Return type:
- Returns:
- f, g, h, gdiff, hdiff where - f is the value of the function, 
- g is the analytical gradient, 
- h is the analytical hessian, 
- gdiff is the difference between the analytical and the finite differences gradient, 
- hdiff is the difference between the analytical and the finite differences hessian, 
 
 
 - confidence_intervals(beta_values, interval_size=0.9)[source]¶
- Calculate confidence intervals on the simulated quantities - Parameters:
- beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution. 
- interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95]. 
 
- Returns:
- two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value - Example: - # Read the estimation results from a file results = EstimationEResults.from_yaml_file(filename = 'my_model.yaml') # Retrieve the names of the betas parameters that have been # estimated betas = biogeme.freeBetaNames # Draw 100 realization of the distribution of the estimators b = results.getBetasForSensitivityAnalysis(betas, size = 100) # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(beta_values) # Calculate the confidence intervals for each formula left, right = biogeme.confidenceIntervals(b, 0.9) 
- Return type:
- tuple of two Pandas dataframes. 
 
 - property dogleg¶
 - property enlarging_factor¶
 - estimate(starting_values=None, recycle=False, run_bootstrap=False, **kwargs)[source]¶
- Estimate the parameters of the model(s). - Returns:
- object containing the estimation results. 
- Return type:
- biogeme.bioResults 
- Parameters:
- starting_values (dict[str, float] | None) 
- recycle (bool) 
- run_bootstrap (bool) 
 
 - Example: - # Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.estimate() - Raises:
- BiogemeError – if no expression has been provided for the likelihood 
- Parameters:
- starting_values (dict[str, float] | None) 
- recycle (bool) 
- run_bootstrap (bool) 
 
- Return type:
 
 - estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]¶
- Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications. - Parameters:
- selected_configurations ( - set[- Configuration]) – set of configurations. If None, all configurations are considered.
- quick_estimate ( - bool) – if True, the final statistics are not calculated.
- recycle ( - bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
- run_bootstrap ( - bool) – if True, bootstrapping is applied.
 
- Return type:
- dict[- str,- EstimationResults]
- Returns:
- object containing the estimation results associated with the name of each specification, as well as a description of each configuration 
 
 - property expressions_registry¶
 - property free_betas_names: list[str]¶
- Returns the names of the parameters that must be estimated - Returns:
- list of names of the parameters 
- Return type:
- list(str) 
 
 - classmethod from_configuration(config_id, multiple_expression, database, user_notes=None, parameters=None, **kwargs)[source]¶
- Obtain the Biogeme object corresponding to the configuration of a multiple expression :type config_id: - str:param config_id: identifier of the configuration- Parameters:
- multiple_expression ( - Expression) – multiple expression containing the catalog.
- database ( - Database) – database to be passed to the Biogeme object
- user_notes ( - str|- None) – these notes will be included in the report file.
- parameters ( - str|- Parameters|- None) – object with the parameters
- config_id (str) 
 
- Return type:
 
 - classmethod from_configuration_and_controller(config_id, central_controller, database, user_notes=None, parameters=None, **kwargs)[source]¶
- Obtain the Biogeme object corresponding to the configuration of a multiple expression - Parameters:
- config_id ( - str) – identifier of the configuration
- central_controller ( - CentralController) – central controller for the multiple expression containing all the catalogs.
- database ( - Database) – database to be passed to the Biogeme object
- user_notes ( - str|- None) – these notes will be included in the report file.
- parameters ( - str|- Parameters|- None) – object with the parameters
 
- Return type:
 
 - property function_evaluator: CompiledFormulaEvaluator¶
 - property function_parameters: dict[str, bool | int | float | str]¶
- Prepare the parameters for the function 
 - property generate_html¶
 - property generate_pickle: bool¶
 - property generate_yaml¶
 - property identification_threshold¶
 - property infeasible_cg¶
 - init_loglikelihood¶
- Init value of the likelihood function 
 - property initial_radius¶
 - property large_data_set¶
 - property largest_neighborhood¶
 - property log_like: Expression | None¶
 - 
log_like_name: str¶
- Keywords used for the name of the loglikelihood formula. Default: ‘log_like’ 
 - property loglike: Expression¶
- For backward compatibility 
 - property max_iterations¶
 - property max_number_parameters_to_report¶
 - property maximum_attempts¶
 - property maximum_number_catalog_expressions¶
 - property maximum_number_parameters¶
 - property missing_data¶
 - property modelName: str¶
 - property model_elements: ModelElements | None¶
 - model_name¶
- Name of the model. Default: ‘biogemeModelDefaultName’ 
 - null_loglikelihood¶
- Log likelihood of the null model 
 - property number_of_draws¶
 - property number_of_jobs¶
 - property number_of_neighbors¶
 - property number_of_observations¶
 - property number_of_threads¶
 - number_unknown_parameters()[source]¶
- Returns the number of parameters that must be estimated - Returns:
- number of parameters 
- Return type:
- int 
 
 - property numerically_safe¶
 - property only_robust_stats¶
 - property optimization_algorithm¶
 - property optimization_parameters: dict[str, bool | int | float | str]¶
 - quick_estimate()[source]¶
- Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)- Returns:
- object containing the estimation results. 
- Return type:
 Example: # Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.quickEstimate() - Raises:
- BiogemeError – if no expression has been provided for the likelihood 
- Return type:
 
 - report_array(array, with_names=True)[source]¶
- Reports the entries of the array up to the maximum number - Parameters:
- array (numpy.array) – array to report 
- with_names (bool) – if True, the names of the parameters are included 
 
- Returns:
- string reporting the values 
- Return type:
- str 
 
 - retrieve_saved_estimates()[source]¶
- Attempt to retrieve previously saved estimation results from a YAML file. - Return type:
- EstimationResults|- None
- Returns:
- An EstimationResults object if a saved result is found. If no file is found or loading fails, None is returned and a warning is logged. 
- Raises:
- BiogemeError – Raised internally by _load_saved_estimates if loading fails, and is caught to log a warning instead. 
 
 - property sample_size¶
 - property save_iterations¶
 - property save_validation_results¶
 - property second_derivatives¶
 - property seed¶
 - set_random_init_values(default_bound=100.0)[source]¶
- Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds. - Parameters:
- default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100. 
- Return type:
- None
 
 - simulate(the_beta_values)[source]¶
- Evaluate all simulation formulas on each row of the database using the specified parameter values. - Parameters:
- the_beta_values ( - dict[- str,- float] |- None) – Dictionary mapping parameter names to values. If None, an exception is raised. Use results.get_beta_values() after estimation or provide explicit values.
- Return type:
- DataFrame
- Returns:
- A pandas DataFrame where each row corresponds to an observation in the database, and each column corresponds to a simulation formula. 
- Raises:
- BiogemeError – If the_beta_values is None or if the number of parameters is incorrect. 
 
 - property steptol¶
 - property tolerance¶
 - property use_jit¶
 - user_notes¶
- User notes 
 - validate(estimation_results, slices, groups=None)[source]¶
- Perform out-of-sample validation of the model. - The validation procedure operates by dividing the dataset into a number of slices. For each slice: - The slice is used as the validation set. 
- The remaining data forms the estimation set. 
- The model is re-estimated on the estimation set. 
- The model is applied to the validation set to compute the log likelihood. 
 - Parameters:
- estimation_results ( - EstimationResults) – Estimation results obtained from the full dataset.
- slices ( - int) – Number of data splits to create for cross-validation.
- groups ( - str|- None) – Optional column name used to group data entries (e.g., panel data). If provided, splitting preserves groups.
 
- Return type:
- list[- ValidationResult]
- Returns:
- List of validation results, one for each data slice. 
- Raises:
- BiogemeError – If the dataset is structured as panel data and incompatible with validation. 
 
 - property version¶
 - property weight: Expression | None¶
 - weight_name¶
- Keyword used for the name of the weight formula. Default: ‘weight’