Biogeme

The core routines of Biogeme.

biogeme.biogeme module

Implementation of the main Biogeme class

author:

Michel Bierlaire

date:

Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, userNotes=None, parameter_file=None, skip_audit=False, **kwargs)[source]

Bases: object

Main class that combines the database and the model

specification.

It works in two modes: estimation and simulation.

__init__(database, formulas, userNotes=None, parameter_file=None, skip_audit=False, **kwargs)[source]

Constructor

Parameters:
  • database (biogeme.database.Database) – choice data.

  • formulas (biogeme.expressions.Expression, or dict(biogeme.expressions.Expression)) – expression or dictionary of expressions that define the model specification. The concept is that each expression is applied to each entry of the database. The keys of the dictionary allow to provide a name to each formula. In the estimation mode, two formulas are needed, with the keys ‘loglike’ and ‘weight’. If only one formula is provided, it is associated with the label ‘loglike’. If no formula is labeled ‘weight’, the weight of each piece of data is supposed to be 1.0. In the simulation mode, the labels of each formula are used as labels of the resulting database.

  • userNotes (str) – these notes will be included in the report file.

  • parameter_file (str) – name of the .toml file where the parameters are read

Raises:

BiogemeError – an audit of the formulas is performed. If a formula has issues, an error is detected and an exception is raised.

property algorithm_name

Name of the optimization algorithm

static argument_warning(old_new_tuple)[source]

Displays a deprecation warning when parameters are provided as arguments.

bestIteration

Store the best iteration found so far.

beta_values_dict_to_list(beta_dict=None)[source]
Transforms a dict with the names of the betas associated

with their values, into a list consistent with the numbering of the ids.

Parameters:

beta_dict (dict(str: float)) – dict with the values of the parameters

Raises:
bootstrap_results

Results of the bootstrap calculation.

property bootstrap_samples

Number of re-estimation for bootstrap samples

bootstrap_time

Time needed to calculate the bootstrap standard errors

calculateInitLikelihood()[source]

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:

value of the log likelihood.

Return type:

float.

calculateLikelihood(x, scaled, batch=None)[source]

Calculates the value of the log likelihood function

Parameters:
  • x (list(float)) – vector of values for the parameters.

  • scaled (bool) – if True, the value is divided by the number of observations used to calculate it. In this case, the values with different sample sizes are comparable. Default: True

  • batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

the calculated value of the log likelihood

Return type:

float.

Raises:
  • ValueError – if the length of the list x is incorrect.

  • BiogemeError – if calculatation with batch is requested

calculateLikelihoodAndDerivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]

Calculate the value of the log likelihood function and its derivatives.

Parameters:
  • x (list(float)) – vector of values for the parameters.

  • scaled (bool) – if True, the results are devided by the number of observations.

  • hessian (bool) – if True, the hessian is calculated. Default: False.

  • bhhh (bool) – if True, the BHHH matrix is calculated. Default: False.

  • batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

f, g, h, bh where

  • f is the value of the function (float)

  • g is the gradient (numpy.array)

  • h is the hessian (numpy.array)

  • bh is the BHHH matrix (numpy.array)

Return type:

tuple float, numpy.array, numpy.array, numpy.array

Raises:
  • ValueError – if the length of the list x is incorrect

  • BiogemeError – if the norm of the gradient is not finite, an error is raised.

  • BiogemeError – if calculatation with batch is requested

calculateNullLoglikelihood(avail)[source]

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:

avail (list of biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If None, all alternatives are always available.

Returns:

value of the log likelihood

Return type:

float

change_init_values(betas)[source]

Modifies the initial values of the pameters in all formula

Parameters:

betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.

checkDerivatives(beta, verbose=False)[source]

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:
  • beta (list(float)) – vector of values for the parameters.

  • verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

tuple.

Returns:

f, g, h, gdiff, hdiff where

  • f is the value of the function,

  • g is the analytical gradient,

  • h is the analytical hessian,

  • gdiff is the difference between the analytical and the finite differences gradient,

  • hdiff is the difference between the analytical and the finite differences hessian,

confidenceIntervals(betaValues, interval_size=0.9)[source]

Calculate confidence intervals on the simulated quantities

Parameters:
  • betaValues (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.

  • interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = res.bioResults(pickleFile = 'myModel.pickle')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size = 100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(betaValues)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple of two Pandas dataframes.

convergence

True if the algorithm has converged

property dogleg

getter for the parameter

drawsProcessingTime

Time needed to generate the draws.

property enlarging_factor

getter for the parameter

estimate(recycle=False, run_bootstrap=False, **kwargs)[source]

Estimate the parameters of the model(s).

Parameters:
  • recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results.

Return type:

biogeme.bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:
  • selected_configurations (set(biogeme.pareto.SetElement)) – set of configurations. If None, all configurations are considered.

  • quick_estimate (bool) – if True, the final statistics are not calculated.

  • recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.

  • run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

Return type:

dict(str: bioResults)

files_of_type(extension, all_files=False)[source]

Identify the list of files with a given extension in the local directory

Parameters:
  • extension (str) – extension of the requested files (without the dot): ‘pickle’, or ‘html’

  • all_files (bool) – if all_files is False, only files containing the name of the model are identified. If all_files is True, all files with the requested extension are identified.

Returns:

list of files with the requested extension.

Return type:

list(str)

formulas

Dictionary containing Biogeme formulas of type biogeme.expressions.Expression. The keys are the names of the formulas.

freeBetaNames()[source]

Deprecated

free_beta_names()[source]

Returns the names of the parameters that must be estimated

Returns:

list of names of the parameters

Return type:

list(str)

classmethod from_configuration(config_id, expression, database, user_notes=None, parameter_file=None, skip_audit=False)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:
  • config_id (strftime) – identifier of the configuration

  • expression (biogeme.expression.Expression) – multiple expression containing all the catalogs.

  • database (Database) – database to be passed to the Biogeme object

  • user_notes (str) – these notes will be included in the report file.

  • parameter_file (str) – name of the TOML file with the parameters

  • skip_audit (bool) – if True, no auditing is performed.

property generateHtml

Boolean variable, True if the HTML file with the results must be generated.

property generatePickle

Boolean variable, True if the PICKLE file with the results must be generated.

property generate_html

Boolean variable, True if the HTML file with the results must be generated.

property generate_pickle

Boolean variable, True if the PICKLE file with the results must be generated.

getBoundsOnBeta(betaName)[source]

Returns the bounds on the parameter as defined by the user.

Parameters:

betaName (string) – name of the parameter

Returns:

lower bound, upper bound

Return type:

tuple

Raises:

BiogemeError – if the name of the parameter is not found.

get_beta_values()[source]
Returns a dict with the initial values of beta. Typically

useful for simulation.

Returns:

dict with the initial values of the beta

Return type:

dict(str: float)

property identification_threshold

Threshold for the eigenvalue to trigger an identification warning

property infeasible_cg

getter for the parameter

initLogLike

Init value of the likelihood function

property initial_radius

getter for the parameter

lastSample

keeps track of the sample of data used to calculate the stochastic gradient / hessian

likelihoodFiniteDifferenceHessian(x)[source]

Calculate the hessian of the log likelihood function using finite differences.

May be useful when the analytical hessian has numerical issues.

Parameters:

x (list(float)) – vector of values for the parameters.

Returns:

finite differences approximation of the hessian.

Return type:

numpy.array

Raises:

ValueError – if the length of the list x is incorrect

loglike

Object of type biogeme.expressions.Expression calculating the formula for the loglikelihood

loglikeName

Keyword used for the name of the loglikelihood formula. Default: ‘loglike’

loglikeSignatures

Internal signature of the formula for the loglikelihood.

property max_number_parameters_to_report

Maximum number of parameters to report.

property maximum_number_catalog_expressions

Maximum number of multiple expressions when Catalog’s are used.

property maxiter

getter for the parameter

property missingData

Code for missing data

property missing_data

Code for missing data

modelName

Name of the model. Default: ‘biogemeModelDefaultName’

monteCarlo

monteCarlo is True if one of the expressions involves a Monte-Carlo integration.

nullLogLike

Log likelihood of the null model

property numberOfDraws

Number of draws for Monte-Carlo integration.

property numberOfThreads

Number of threads used for parallel computing. Default: the number of available CPU.

property number_of_draws

Number of draws for Monte-Carlo integration.

property number_of_threads

Number of threads used for parallel computing. Default: the number of available CPU.

number_unknown_parameters()[source]

Returns the number of parameters that must be estimated

Returns:

number of parameters

Return type:

int

property only_robust_stats

True if only the robust statistics need to be reported. If False, the statistics from the Rao-Cramer bound are also reported.

optimizationMessages

Information provided by the optimization algorithm after completion.

optimize(starting_values=None)[source]

Calls the optimization algorithm. The function self.algorithm is called.

Parameters:

starting_values (list(float)) – starting point for the algorithm

Returns:

x, messages

  • x is the solution generated by the algorithm,

  • messages is a dictionary describing several information about the algorithm

Return type:

numpay.array, dict(str:object)

Raises:

BiogemeError – an error is raised if no algorithm is specified.

quickEstimate(**kwargs)[source]
Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
Returns:

object containing the estimation results.

Return type:

bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()
Raises:

BiogemeError – if no expression has been provided for the likelihood

report_array(array, with_names=True)[source]

Reports the entries of the array up to the maximum number

Parameters:
  • array (numpy.array) – array to report

  • with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

reset_id_manager()[source]

Reset all the ids of the elementary expression in the formulas

property saveIterations

If True, the current iterate is saved after each iteration, in a file named __[modelName].iter, where [modelName] is the name given to the model. If such a file exists, the starting values for the estimation are replaced by the values saved in the file.

property save_iterations

Same as saveIterations, with another syntax

property second_derivatives

getter for the parameter

property seed_param

getter for the parameter

setRandomInitValues(defaultBound=100.0)[source]

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:

defaultBound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.

simulate(theBetaValues)[source]

Applies the formulas to each row of the database.

Parameters:

theBetaValues (dict(str, float)) – values of the parameters to be used in the calculations. If None, the default values are used. Default: None.

Returns:

a pandas data frame with the simulated value. Each row corresponds to a row in the database, and each column to a formula.

Return type:

Pandas data frame

Example:

# Read the estimation results from a file
results = res.bioResults(pickleFile = 'myModel.pickle')
# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(betaValues)
Raises:
property steptol

getter for the parameter

property tolerance

getter for the parameter

userNotes

User notes

validate(estimationResults, validationData)[source]

Perform out-of-sample validation.

The function performs the following tasks:

  • each slice defines a validation set (the slice itself) and an estimation set (the rest of the data),

  • the model is re-estimated on the estimation set,

  • the estimated model is applied on the validation set,

  • the value of the log likelihood for each observation is reported.

Parameters:
  • estimationResults (bioResults) – results of the model estimation based on the full data.

  • validationData (list(tuple(pandas.DataFrame, pandas.DataFrame))) – list of estimation and validation data sets

Returns:

a list containing as many items as slices. Each item is the result of the simulation on the validation set.

Return type:

list(pandas.DataFrame)

Raises:

BiogemeError – An error is raised if the database is structured as panel data.

weight

Object of type biogeme.expressions.Expression calculating the weight of each observation in the sample.

weightName

Keyword used for the name of the weight formula. Default: ‘weight’

weightSignatures

Internal signature of the formula for the weight.

class biogeme.biogeme.OldNewParamTuple(old, new, section)[source]

Bases: NamedTuple

Parameters:
  • old (str) –

  • new (str) –

  • section (str) –

new: str

Alias for field number 1

old: str

Alias for field number 0

section: str

Alias for field number 2