biogeme.biogeme module

The core routines of Biogeme.

Implementation of the main Biogeme class

author:

Michel Bierlaire

date:

Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, user_notes=None, parameters=None, skip_audit=False, **kwargs)[source]

Bases: object

Main class that combines the database and the model

specification.

It works in two modes: estimation and simulation.

Parameters:
static argument_warning(old_new_tuple)[source]

Displays a deprecation warning when parameters are provided as arguments.

Parameters:

old_new_tuple (OldNewParamTuple)

bestIteration

Store the best iteration found so far.

beta_values_dict_to_list(beta_dict=None)[source]
Transforms a dict with the names of the betas associated

with their values, into a list consistent with the numbering of the ids.

Parameters:

beta_dict (dict(str: float)) – dict with the values of the parameters

Raises:
Return type:

list[float]

bootstrap_results

Results of the bootstrap calculation.

property bootstrap_samples
bootstrap_time

Time needed to calculate the bootstrap standard errors

calculateInitLikelihood()[source]

Warning

This function is deprecated. Use calculate_init_likelihood() instead.

Return type:

float

calculateLikelihood(x, scaled, batch=None)[source]

Warning

This function is deprecated. Use calculate_likelihood() instead.

Parameters:
  • x (ndarray)

  • scaled (bool)

  • batch (float | None)

calculateLikelihoodAndDerivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]

Warning

This function is deprecated. Use calculate_likelihood_and_derivatives() instead.

Return type:

FunctionOutput

Parameters:
  • x (ndarray)

  • scaled (bool)

  • hessian (bool)

  • bhhh (bool)

  • batch (float | None)

calculateNullLoglikelihood(avail)[source]

Warning

This function is deprecated. Use calculate_null_loglikelihood() instead.

Parameters:

avail (dict[int, Expression | float | int | bool])

calculate_init_likelihood()[source]

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:

value of the log likelihood.

Return type:

float.

calculate_likelihood(x, scaled, batch=None)[source]

Calculates the value of the log likelihood function

Parameters:
  • x (list(float)) – vector of values for the parameters.

  • scaled (bool) – if True, the value is divided by the number of observations used to calculate it. In this case, the values with different sample sizes are comparable. Default: True

  • batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

the calculated value of the log likelihood

Return type:

float.

Raises:
  • ValueError – if the length of the list x is incorrect.

  • BiogemeError – if calculation with batch is requested

calculate_likelihood_and_derivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]

Calculate the value of the log likelihood function and its derivatives.

Parameters:
  • x (list(float)) – vector of values for the parameters.

  • scaled (bool) – if True, the results are divided by the number of observations.

  • hessian (bool) – if True, the hessian is calculated. Default: False.

  • bhhh (bool) – if True, the BHHH matrix is calculated. Default: False.

  • batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

f, g, h, bh where

  • f is the value of the function (float)

  • g is the gradient (numpy.array)

  • h is the hessian (numpy.array)

  • bh is the BHHH matrix (numpy.array)

Return type:

tuple float, numpy.array, numpy.array, numpy.array

Raises:
  • ValueError – if the length of the list x is incorrect

  • BiogemeError – if the norm of the gradient is not finite, an error is raised.

  • BiogemeError – if calculatation with batch is requested

calculate_null_loglikelihood(avail)[source]

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:

avail (list of biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.

Returns:

value of the log likelihood

Return type:

float

change_init_values(betas)[source]

Modifies the initial values of the parameters in all formula

Parameters:

betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.

Return type:

None

checkDerivatives(beta, verbose=False)[source]

Warning

This function is deprecated. Use check_derivatives() instead.

Parameters:
  • beta (ndarray | list[float])

  • verbose (bool)

check_derivatives(beta, verbose=False)[source]

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:
  • beta (list(float)) – vector of values for the parameters.

  • verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

tuple.

Returns:

f, g, h, gdiff, hdiff where

  • f is the value of the function,

  • g is the analytical gradient,

  • h is the analytical hessian,

  • gdiff is the difference between the analytical and the finite differences gradient,

  • hdiff is the difference between the analytical and the finite differences hessian,

confidenceIntervals(beta_values, interval_size=0.9)[source]

Warning

This function is deprecated. Use confidence_intervals() instead.

Return type:

tuple[DataFrame, DataFrame]

Parameters:
  • beta_values (list[dict[str, float]])

  • interval_size (float)

confidence_intervals(beta_values, interval_size=0.9)[source]

Calculate confidence intervals on the simulated quantities

Parameters:
  • beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.

  • interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = res.bioResults(pickle_file = 'myModel.pickle')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size = 100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple of two Pandas dataframes.

convergence

True if the algorithm has converged

property dogleg
drawsProcessingTime

Time needed to generate the draws.

property enlarging_factor
estimate(recycle=False, run_bootstrap=False, **kwargs)[source]

Estimate the parameters of the model(s).

Parameters:
  • recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results.

Return type:

biogeme.bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:
  • recycle (bool)

  • run_bootstrap (bool)

Return type:

bioResults

estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:
  • selected_configurations (set(biogeme.pareto.SetElement)) – set of configurations. If None, all configurations are considered.

  • quick_estimate (bool) – if True, the final statistics are not calculated.

  • recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

Return type:

dict(str: bioResults)

files_of_type(extension, all_files=False)[source]

Identify the list of .py with a given extension in the local directory

Parameters:
  • extension (str) – extension of the requested .py (without the dot): ‘pickle’, or ‘html’

  • all_files (bool) – if all_files is False, only .py containing the name of the model are identified. If all_files is True, all .py with the requested extension are identified.

Returns:

list of .py with the requested extension.

Return type:

list(str)

formulas

Dictionary containing Biogeme formulas of type biogeme.expressions.Expression. The keys are the names of the formulas.

property freeBetaNames: list[str]
property free_beta_names: list[str]

Returns the names of the parameters that must be estimated

Returns:

list of names of the parameters

Return type:

list(str)

classmethod from_configuration(config_id, expression, database, user_notes=None, parameters=None, skip_audit=False)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:
  • config_id (strftime) – identifier of the configuration

  • expression (biogeme.expression.Expression) – multiple expression containing all the catalogs.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str) – these notes will be included in the report file.

  • parameters (str | Parameters | None) – object with the parameters

  • skip_audit (bool) – if True, no auditing is performed.

Return type:

BIOGEME

property generatePickle: bool

Boolean variable, True if the PICKLE file with the results must be generated.

property generate_html
property generate_pickle
getBoundsOnBeta(beta_name)[source]

Warning

This function is deprecated. Use get_bounds_on_beta() instead.

Return type:

tuple[float, float]

Parameters:

beta_name (str)

get_beta_values()[source]
Returns a dict with the initial values of Beta. Typically

useful for simulation.

Returns:

dict with the initial values of the Beta

Return type:

dict(str: float)

get_bounds_on_beta(beta_name)[source]

Returns the bounds on the parameter as defined by the user.

Parameters:

beta_name (string) – name of the parameter

Returns:

lower bound, upper bound

Return type:

tuple

Raises:

BiogemeError – if the name of the parameter is not found.

property identification_threshold
property infeasible_cg
initLogLike

Init value of the likelihood function

property initial_radius
classmethod initialize_properties(properties)[source]
is_model_complex()[source]

Check if the model is potentially complex to estimate

Return type:

bool

property large_data_set
property largest_neighborhood
lastSample

keeps track of the sample of data used to calculate the stochastic gradient / hessian

likelihoodFiniteDifferenceHessian(x)[source]

Warning

This function is deprecated. Use likelihood_finite_difference_hessian() instead.

Return type:

ndarray

Parameters:

x (ndarray)

likelihood_finite_difference_hessian(x)[source]

Calculate the hessian of the log likelihood function using finite differences.

May be useful when the analytical hessian has numerical issues.

Parameters:

x (list(float)) – vector of values for the parameters.

Returns:

finite differences approximation of the hessian.

Return type:

numpy.array

Raises:

ValueError – if the length of the list x is incorrect

log_like: Expression

Object of type biogeme.expressions.Expression calculating the formula for the loglikelihood

log_like_name: str

Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

property loglike: Expression

For backward compatibility

loglikeSignatures: list[bytes]

Internal signature of the formula for the loglikelihood.

static make_getter(param_name)[source]
static make_setter(param_name)[source]
property max_iterations
property max_number_parameters_to_report
property maximum_attempts
property maximum_number_catalog_expressions
property maximum_number_parameters
property missing_data
modelName

Name of the model. Default: ‘biogemeModelDefaultName’

monte_carlo

monte_carlo is True if one of the expressions involves a Monte-Carlo integration.

nullLogLike

Log likelihood of the null model

property numberOfDraws: int

Number of draws for Monte-Carlo integration.

property numberOfThreads: int

Number of threads used for parallel computing. Default: the number of available CPU. Maintained for backward compatibility.

property number_of_draws
property number_of_neighbors
property number_of_threads: int

Number of threads used for parallel computing. Default: the number of available CPU.

number_unknown_parameters()[source]

Returns the number of parameters that must be estimated

Returns:

number of parameters

Return type:

int

property only_robust_stats
optimizationMessages

Information provided by the optimization algorithm after completion.

property optimization_algorithm
optimize(starting_values=None)[source]

Calls the optimization algorithm. The function self.algorithm is called.

Parameters:

starting_values (list(float)) – starting point for the algorithm

Returns:

x, messages

  • x is the solution generated by the algorithm,

  • messages is a dictionary describing several information about the algorithm

Return type:

numpay.array, dict(str:object)

Raises:

BiogemeError – an error is raised if no algorithm is specified.

properties_initialized = True
quickEstimate(**kwargs)[source]

Warning

This function is deprecated. Use quick_estimate() instead.

Return type:

bioResults

quick_estimate(**kwargs)[source]
Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
Returns:

object containing the estimation results.

Return type:

bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

Return type:

bioResults

recycled_estimation(run_bootstrap=False, **kwargs)[source]
Return type:

bioResults

Parameters:

run_bootstrap (bool)

report_array(array, with_names=True)[source]

Reports the entries of the array up to the maximum number

Parameters:
  • array (numpy.array) – array to report

  • with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

reset_id_manager()[source]

Reset all the ids of the elementary expression in the formulas

Return type:

None

property save_iterations
property second_derivatives
property seed
setRandomInitValues(default_bound=100.0)[source]

Warning

This function is deprecated. Use set_random_init_values() instead.

Return type:

None

Parameters:

default_bound (float)

set_random_init_values(default_bound=100.0)[source]

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:

default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.

Return type:

None

simulate(the_beta_values)[source]

Applies the formulas to each row of the database.

Parameters:

the_beta_values (dict(str, float)) – values of the parameters to be used in the calculations. If None, the default values are used. Default: None.

Returns:

a pandas data frame with the simulated value. Each row corresponds to a row in the database, and each column to a formula.

Return type:

Pandas data frame

Example:

# Read the estimation results from a file
results = res.bioResults(pickle_file = 'myModel.pickle')
# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)
Raises:
Parameters:

the_beta_values (dict[str, float] | None)

Return type:

DataFrame

property steptol
property tolerance
user_notes

User notes

validate(estimation_results, validation_data)[source]

Perform out-of-sample validation.

The function performs the following tasks:

  • each slice defines a validation set (the slice itself) and an estimation set (the rest of the data),

  • the model is re-estimated on the estimation set,

  • the estimated model is applied on the validation set,

  • the value of the log likelihood for each observation is reported.

Parameters:
  • estimation_results (bioResults) – results of the model estimation based on the full data.

  • validation_data (list(tuple(pandas.DataFrame, pandas.DataFrame))) – list of estimation and validation data sets

Returns:

a list containing as many items as slices. Each item is the result of the simulation on the validation set.

Return type:

list(pandas.DataFrame)

Raises:

BiogemeError – An error is raised if the database is structured as panel data.

property version
weight

Object of type biogeme.expressions.Expression calculating the weight of each observation in the sample.

weightSignatures: list[bytes]

Internal signature of the formula for the weight.

weight_name

Keyword used for the name of the weight formula. Default: ‘weight’

class biogeme.biogeme.OldNewParamTuple(old, new, section)[source]

Bases: NamedTuple

Parameters:
  • old (str)

  • new (str)

  • section (str)

new: str

Alias for field number 1

old: str

Alias for field number 0

section: str

Alias for field number 2