biogeme.biogeme module

Implementation of the main Biogeme class

author:: Michel Bierlaire
date:: Tue Mar 26 16:45:15 2019

It combines the database and the model specification.

class biogeme.biogeme.BIOGEME(database, formulas, user_notes=None, parameters=None, skip_audit=False, **kwargs)[source]

Bases: object

Main class that combines the database and the model: specification.

It works in two modes: estimation and simulation.

Parameters:

database (db.Database)
formulas (Expression | dict[str, Expression])
user_notes (str | None)
parameters (str | Parameters | None)
skip_audit (bool)

static argument_warning(old_new_tuple)[source]

Displays a deprecation warning when parameters are provided as arguments.

Parameters:: old_new_tuple (OldNewParamTuple)

bestIteration: Store the best iteration found so far.

beta_values_dict_to_list(beta_dict=None)[source]

Transforms a dict with the names of the betas associated: with their values, into a list consistent with the numbering of the ids.

Parameters:

beta_dict (dict(str: float)) – dict with the values of the parameters

Raises:

BiogemeError – if the parameter is not a dict
BiogemeError – if a parameter is missing in the dict

Return type:

list[float]

biogeme_parameters: Parameters

bootstrap_results: Results of the bootstrap calculation.

property bootstrap_samples

bootstrap_time: Time needed to calculate the bootstrap standard errors

calculateInitLikelihood()[source]

Warning

This function is deprecated. Use calculate_init_likelihood() instead.

Return type:: float

calculateLikelihood(x, scaled, batch=None)[source]

Warning

This function is deprecated. Use calculate_likelihood() instead.

Parameters:

x (ndarray)
scaled (bool)
batch (float | None)

calculateLikelihoodAndDerivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]

Warning

This function is deprecated. Use calculate_likelihood_and_derivatives() instead.

Return type:

FunctionOutput

Parameters:

x (ndarray)
scaled (bool)
hessian (bool)
bhhh (bool)
batch (float | None)

calculateNullLoglikelihood(avail)[source]

Warning

This function is deprecated. Use calculate_null_loglikelihood() instead.

Parameters:: avail (dict[int, Expression | float | int | bool])

calculate_init_likelihood()[source]

Calculate the value of the log likelihood function

The default values of the parameters are used.

Returns:: value of the log likelihood.
Return type:: float.

calculate_likelihood(x, scaled, batch=None)[source]

Calculates the value of the log likelihood function

Parameters:

x (list(float)) – vector of values for the parameters.
scaled (bool) – if True, the value is divided by the number of observations used to calculate it. In this case, the values with different sample sizes are comparable. Default: True
batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

the calculated value of the log likelihood

Return type:

float.

Raises:

ValueError – if the length of the list x is incorrect.
BiogemeError – if calculation with batch is requested

calculate_likelihood_and_derivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]

Calculate the value of the log likelihood function and its derivatives.

Parameters:

x (list(float)) – vector of values for the parameters.
scaled (bool) – if True, the results are divided by the number of observations.
hessian (bool) – if True, the hessian is calculated. Default: False.
bhhh (bool) – if True, the BHHH matrix is calculated. Default: False.
batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None

Returns:

f, g, h, bh where

f is the value of the function (float)
g is the gradient (numpy.array)
h is the hessian (numpy.array)
bh is the BHHH matrix (numpy.array)

Return type:

tuple float, numpy.array, numpy.array, numpy.array

Raises:

ValueError – if the length of the list x is incorrect
BiogemeError – if the norm of the gradient is not finite, an error is raised.
BiogemeError – if calculatation with batch is requested

calculate_null_loglikelihood(avail)[source]

Calculate the log likelihood of the null model that predicts equal probability for each alternative

Parameters:: avail (list of biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.
Returns:: value of the log likelihood
Return type:: float

change_init_values(betas)[source]

Modifies the initial values of the parameters in all formula

Parameters:: betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.
Return type:: None

checkDerivatives(beta, verbose=False)[source]

Warning

This function is deprecated. Use check_derivatives() instead.

Parameters:

beta (ndarray | list[float])
verbose (bool)

check_derivatives(beta, verbose=False)[source]

Verifies the implementation of the derivatives.

It compares the analytical version with the finite differences approximation.

Parameters:

beta (list(float)) – vector of values for the parameters.
verbose (bool) – if True, the comparisons are reported. Default: False.

Return type:

tuple.

Returns:

f, g, h, gdiff, hdiff where

f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,

confidenceIntervals(beta_values, interval_size=0.9)[source]

Warning

This function is deprecated. Use confidence_intervals() instead.

Return type:

tuple[DataFrame, DataFrame]

Parameters:

beta_values (list[dict[str, float]])
interval_size (float)

confidence_intervals(beta_values, interval_size=0.9)[source]

Calculate confidence intervals on the simulated quantities

Parameters:

beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.
interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].

Returns:

two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value

Example:

# Read the estimation results from a file
results = res.bioResults(pickle_file = 'myModel.pickle')
# Retrieve the names of the betas parameters that have been
# estimated
betas = biogeme.freeBetaNames

# Draw 100 realization of the distribution of the estimators
b = results.getBetasForSensitivityAnalysis(betas, size = 100)

# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

# Calculate the confidence intervals for each formula
left, right = biogeme.confidenceIntervals(b, 0.9)

Return type:

tuple of two Pandas dataframes.

convergence: True if the algorithm has converged

property dogleg

drawsProcessingTime: Time needed to generate the draws.

property enlarging_factor

estimate(recycle=False, run_bootstrap=False, **kwargs)[source]

Estimate the parameters of the model(s).

Parameters:

recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results.

Return type:

biogeme.bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.estimate()

Raises:

BiogemeError – if no expression has been provided for the likelihood

Parameters:

recycle (bool)
run_bootstrap (bool)

Return type:

bioResults

estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]

Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.

Parameters:

selected_configurations (set(biogeme.pareto.SetElement)) – set of configurations. If None, all configurations are considered.
quick_estimate (bool) – if True, the final statistics are not calculated.
recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
run_bootstrap (bool) – if True, bootstrapping is applied.

Returns:

object containing the estimation results associated with the name of each specification, as well as a description of each configuration

Return type:

dict(str: bioResults)

files_of_type(extension, all_files=False)[source]

Identify the list of .py with a given extension in the local directory

Parameters:

extension (str) – extension of the requested .py (without the dot): ‘pickle’, or ‘html’
all_files (bool) – if all_files is False, only .py containing the name of the model are identified. If all_files is True, all .py with the requested extension are identified.

Returns:

list of .py with the requested extension.

Return type:

list(str)

formulas: Dictionary containing Biogeme formulas of type biogeme.expressions.Expression. The keys are the names of the formulas.

property freeBetaNames: list[str]

property free_beta_names: list[str]

Returns the names of the parameters that must be estimated

Returns:: list of names of the parameters
Return type:: list(str)

classmethod from_configuration(config_id, expression, database, user_notes=None, parameters=None, skip_audit=False)[source]

Obtain the Biogeme object corresponding to the configuration of a multiple expression

Parameters:

config_id (strftime) – identifier of the configuration
expression (biogeme.expression.Expression) – multiple expression containing all the catalogs.
database (Database) – database to be passed to the Biogeme object
user_notes (str) – these notes will be included in the report file.
parameters (str | Parameters | None) – object with the parameters
skip_audit (bool) – if True, no auditing is performed.

Return type:

BIOGEME

property generatePickle: bool: Boolean variable, True if the PICKLE file with the results must be generated.

property generate_html

property generate_pickle

getBoundsOnBeta(beta_name)[source]

Warning

This function is deprecated. Use get_bounds_on_beta() instead.

Return type:: tuple[float, float]
Parameters:: beta_name (str)

get_beta_values()[source]

Returns a dict with the initial values of Beta. Typically: useful for simulation.

Returns:: dict with the initial values of the Beta
Return type:: dict(str: float)

get_bounds_on_beta(beta_name)[source]

Returns the bounds on the parameter as defined by the user.

Parameters:: beta_name (string) – name of the parameter
Returns:: lower bound, upper bound
Return type:: tuple
Raises:: BiogemeError – if the name of the parameter is not found.

property identification_threshold

property infeasible_cg

initLogLike: Init value of the likelihood function

property initial_radius

classmethod initialize_properties(properties)[source]

is_model_complex()[source]

Check if the model is potentially complex to estimate

Return type:: bool

property large_data_set

property largest_neighborhood

lastSample: keeps track of the sample of data used to calculate the stochastic gradient / hessian

likelihoodFiniteDifferenceHessian(x)[source]

Warning

This function is deprecated. Use likelihood_finite_difference_hessian() instead.

Return type:: ndarray
Parameters:: x (ndarray)

likelihood_finite_difference_hessian(x)[source]

Calculate the hessian of the log likelihood function using finite differences.

May be useful when the analytical hessian has numerical issues.

Parameters:: x (list(float)) – vector of values for the parameters.
Returns:: finite differences approximation of the hessian.
Return type:: numpy.array
Raises:: ValueError – if the length of the list x is incorrect

log_like: Expression: Object of type biogeme.expressions.Expression calculating the formula for the loglikelihood

log_like_name: str: Keywords used for the name of the loglikelihood formula. Default: ‘log_like’

log_like_valid_names: list[str]

property loglike: Expression: For backward compatibility

loglikeSignatures: list[bytes]: Internal signature of the formula for the loglikelihood.

static make_getter(param_name)[source]

static make_setter(param_name)[source]

property max_iterations

property max_number_parameters_to_report

property maximum_attempts

property maximum_number_catalog_expressions

property maximum_number_parameters

property missing_data

modelName: Name of the model. Default: ‘biogemeModelDefaultName’

monte_carlo: monte_carlo is True if one of the expressions involves a Monte-Carlo integration.

nullLogLike: Log likelihood of the null model

property numberOfDraws: int: Number of draws for Monte-Carlo integration.

property numberOfThreads: int: Number of threads used for parallel computing. Default: the number of available CPU. Maintained for backward compatibility.

property number_of_draws

property number_of_neighbors

property number_of_threads: int: Number of threads used for parallel computing. Default: the number of available CPU.

number_unknown_parameters()[source]

Returns the number of parameters that must be estimated

Returns:: number of parameters
Return type:: int

property only_robust_stats

optimizationMessages: Information provided by the optimization algorithm after completion.

property optimization_algorithm

optimize(starting_values=None)[source]

Calls the optimization algorithm. The function self.algorithm is called.

Parameters:

starting_values (list(float)) – starting point for the algorithm

Returns:

x, messages

x is the solution generated by the algorithm,
messages is a dictionary describing several information about the algorithm

Return type:

numpay.array, dict(str:object)

Raises:

BiogemeError – an error is raised if no algorithm is specified.

parameter_file: str

properties_initialized = True

quickEstimate(**kwargs)[source]

Warning

This function is deprecated. Use quick_estimate() instead.

Return type:: bioResults

quick_estimate(**kwargs)[source]

Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)

Returns:: object containing the estimation results.
Return type:: bioResults

Example:

# Create an instance of biogeme
biogeme  = bio.BIOGEME(database, logprob)

# Gives a name to the model
biogeme.modelName = 'mymodel'

# Estimate the parameters
results = biogeme.quickEstimate()

Raises:: BiogemeError – if no expression has been provided for the likelihood
Return type:: bioResults

recycled_estimation(run_bootstrap=False, **kwargs)[source]

Return type:: bioResults
Parameters:: run_bootstrap (bool)

report_array(array, with_names=True)[source]

Reports the entries of the array up to the maximum number

Parameters:

array (numpy.array) – array to report
with_names (bool) – if True, the names of the parameters are included

Returns:

string reporting the values

Return type:

str

reset_id_manager()[source]

Reset all the ids of the elementary expression in the formulas

Return type:: None

property save_iterations

property second_derivatives

property seed

setRandomInitValues(default_bound=100.0)[source]

Warning

This function is deprecated. Use set_random_init_values() instead.

Return type:: None
Parameters:: default_bound (float)

set_random_init_values(default_bound=100.0)[source]

Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.

Parameters:: default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.
Return type:: None

short_names: ModelNames | None

simulate(the_beta_values)[source]

Applies the formulas to each row of the database.

Parameters:: the_beta_values (dict(str, float)) – values of the parameters to be used in the calculations. If None, the default values are used. Default: None.
Returns:: a pandas data frame with the simulated value. Each row corresponds to a row in the database, and each column to a formula.
Return type:: Pandas data frame

Example:

# Read the estimation results from a file
results = res.bioResults(pickle_file = 'myModel.pickle')
# Simulate the formulas using the nominal values
simulatedValues = biogeme.simulate(beta_values)

Raises:

BiogemeError – if the number of parameters is incorrect
BiogemeError – if theBetaValues is None.

Parameters:

the_beta_values (dict[str, float] | None)

Return type:

DataFrame

property steptol

property tolerance

user_notes: User notes

validate(estimation_results, validation_data)[source]

Perform out-of-sample validation.

The function performs the following tasks:

each slice defines a validation set (the slice itself) and an estimation set (the rest of the data),

the model is re-estimated on the estimation set,

the estimated model is applied on the validation set,

the value of the log likelihood for each observation is reported.

Parameters:

estimation_results (bioResults) – results of the model estimation based on the full data.
validation_data (list(tuple(pandas.DataFrame, pandas.DataFrame))) – list of estimation and validation data sets

Returns:

a list containing as many items as slices. Each item is the result of the simulation on the validation set.

Return type:

list(pandas.DataFrame)

Raises:

BiogemeError – An error is raised if the database is structured as panel data.

property version

weight: Object of type biogeme.expressions.Expression calculating the weight of each observation in the sample.

weightSignatures: list[bytes]: Internal signature of the formula for the weight.

weight_name: Keyword used for the name of the weight formula. Default: ‘weight’

class biogeme.biogeme.OldNewParamTuple(old, new, section)[source]

Bases: NamedTuple

Parameters:

old (str)
new (str)
section (str)

new: str: Alias for field number 1

old: str: Alias for field number 0

section: str: Alias for field number 2