biogeme.biogeme module
Implementation of the main Biogeme class
- author:
Michel Bierlaire
- date:
Tue Mar 26 16:45:15 2019
It combines the database and the model specification.
- class biogeme.biogeme.BIOGEME(database, formulas, user_notes=None, parameters=None, skip_audit=False, **kwargs)[source]
Bases:
object
- Main class that combines the database and the model
specification.
It works in two modes: estimation and simulation.
- Parameters:
database (db.Database)
formulas (Expression | dict[str, Expression])
user_notes (str | None)
parameters (str | Parameters | None)
skip_audit (bool)
- static argument_warning(old_new_tuple)[source]
Displays a deprecation warning when parameters are provided as arguments.
- Parameters:
old_new_tuple (OldNewParamTuple)
- bestIteration
Store the best iteration found so far.
- beta_values_dict_to_list(beta_dict=None)[source]
- Transforms a dict with the names of the betas associated
with their values, into a list consistent with the numbering of the ids.
- Parameters:
beta_dict (dict(str: float)) – dict with the values of the parameters
- Raises:
BiogemeError – if the parameter is not a dict
BiogemeError – if a parameter is missing in the dict
- Return type:
list
[float
]
-
biogeme_parameters:
Parameters
- bootstrap_results
Results of the bootstrap calculation.
- property bootstrap_samples
- bootstrap_time
Time needed to calculate the bootstrap standard errors
- calculateInitLikelihood()[source]
Warning
This function is deprecated. Use
calculate_init_likelihood()
instead.- Return type:
float
- calculateLikelihood(x, scaled, batch=None)[source]
Warning
This function is deprecated. Use
calculate_likelihood()
instead.- Parameters:
x (ndarray)
scaled (bool)
batch (float | None)
- calculateLikelihoodAndDerivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]
Warning
This function is deprecated. Use
calculate_likelihood_and_derivatives()
instead.- Return type:
- Parameters:
x (ndarray)
scaled (bool)
hessian (bool)
bhhh (bool)
batch (float | None)
- calculateNullLoglikelihood(avail)[source]
Warning
This function is deprecated. Use
calculate_null_loglikelihood()
instead.- Parameters:
avail (dict[int, Expression | float | int | bool])
- calculate_init_likelihood()[source]
Calculate the value of the log likelihood function
The default values of the parameters are used.
- Returns:
value of the log likelihood.
- Return type:
float.
- calculate_likelihood(x, scaled, batch=None)[source]
Calculates the value of the log likelihood function
- Parameters:
x (list(float)) – vector of values for the parameters.
scaled (bool) – if True, the value is divided by the number of observations used to calculate it. In this case, the values with different sample sizes are comparable. Default: True
batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None
- Returns:
the calculated value of the log likelihood
- Return type:
float.
- Raises:
ValueError – if the length of the list x is incorrect.
BiogemeError – if calculation with batch is requested
- calculate_likelihood_and_derivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]
Calculate the value of the log likelihood function and its derivatives.
- Parameters:
x (list(float)) – vector of values for the parameters.
scaled (bool) – if True, the results are divided by the number of observations.
hessian (bool) – if True, the hessian is calculated. Default: False.
bhhh (bool) – if True, the BHHH matrix is calculated. Default: False.
batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None
- Returns:
f, g, h, bh where
f is the value of the function (float)
g is the gradient (numpy.array)
h is the hessian (numpy.array)
bh is the BHHH matrix (numpy.array)
- Return type:
tuple float, numpy.array, numpy.array, numpy.array
- Raises:
ValueError – if the length of the list x is incorrect
BiogemeError – if the norm of the gradient is not finite, an error is raised.
BiogemeError – if calculatation with batch is requested
- calculate_null_loglikelihood(avail)[source]
Calculate the log likelihood of the null model that predicts equal probability for each alternative
- Parameters:
avail (list of
biogeme.expressions.Expression
) – list of expressions to evaluate the availability conditions for each alternative. If 1 is provided, it is always available.- Returns:
value of the log likelihood
- Return type:
float
- change_init_values(betas)[source]
Modifies the initial values of the parameters in all formula
- Parameters:
betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters.
- Return type:
None
- checkDerivatives(beta, verbose=False)[source]
Warning
This function is deprecated. Use
check_derivatives()
instead.- Parameters:
beta (ndarray | list[float])
verbose (bool)
- check_derivatives(beta, verbose=False)[source]
Verifies the implementation of the derivatives.
It compares the analytical version with the finite differences approximation.
- Parameters:
beta (list(float)) – vector of values for the parameters.
verbose (bool) – if True, the comparisons are reported. Default: False.
- Return type:
tuple.
- Returns:
f, g, h, gdiff, hdiff where
f is the value of the function,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical and the finite differences gradient,
hdiff is the difference between the analytical and the finite differences hessian,
- confidenceIntervals(beta_values, interval_size=0.9)[source]
Warning
This function is deprecated. Use
confidence_intervals()
instead.- Return type:
tuple
[DataFrame
,DataFrame
]- Parameters:
beta_values (list[dict[str, float]])
interval_size (float)
- confidence_intervals(beta_values, interval_size=0.9)[source]
Calculate confidence intervals on the simulated quantities
- Parameters:
beta_values (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution.
interval_size (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95].
- Returns:
two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value
Example:
# Read the estimation results from a file results = res.bioResults(pickle_file = 'myModel.pickle') # Retrieve the names of the betas parameters that have been # estimated betas = biogeme.freeBetaNames # Draw 100 realization of the distribution of the estimators b = results.getBetasForSensitivityAnalysis(betas, size = 100) # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(beta_values) # Calculate the confidence intervals for each formula left, right = biogeme.confidenceIntervals(b, 0.9)
- Return type:
tuple of two Pandas dataframes.
- convergence
True if the algorithm has converged
- property dogleg
- drawsProcessingTime
Time needed to generate the draws.
- property enlarging_factor
- estimate(recycle=False, run_bootstrap=False, **kwargs)[source]
Estimate the parameters of the model(s).
- Parameters:
recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
run_bootstrap (bool) – if True, bootstrapping is applied.
- Returns:
object containing the estimation results.
- Return type:
biogeme.bioResults
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.estimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Parameters:
recycle (bool)
run_bootstrap (bool)
- Return type:
- estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, run_bootstrap=False)[source]
Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications.
- Parameters:
selected_configurations (set(biogeme.pareto.SetElement)) – set of configurations. If None, all configurations are considered.
quick_estimate (bool) – if True, the final statistics are not calculated.
recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed.
run_bootstrap (bool) – if True, bootstrapping is applied.
- Returns:
object containing the estimation results associated with the name of each specification, as well as a description of each configuration
- Return type:
dict(str: bioResults)
- files_of_type(extension, all_files=False)[source]
Identify the list of .py with a given extension in the local directory
- Parameters:
extension (str) – extension of the requested .py (without the dot): ‘pickle’, or ‘html’
all_files (bool) – if all_files is False, only .py containing the name of the model are identified. If all_files is True, all .py with the requested extension are identified.
- Returns:
list of .py with the requested extension.
- Return type:
list(str)
- formulas
Dictionary containing Biogeme formulas of type
biogeme.expressions.Expression
. The keys are the names of the formulas.
- property freeBetaNames: list[str]
- property free_beta_names: list[str]
Returns the names of the parameters that must be estimated
- Returns:
list of names of the parameters
- Return type:
list(str)
- classmethod from_configuration(config_id, expression, database, user_notes=None, parameters=None, skip_audit=False)[source]
Obtain the Biogeme object corresponding to the configuration of a multiple expression
- Parameters:
config_id (strftime) – identifier of the configuration
expression (biogeme.expression.Expression) – multiple expression containing all the catalogs.
database (Database) – database to be passed to the Biogeme object
user_notes (str) – these notes will be included in the report file.
parameters (
str
|Parameters
|None
) – object with the parametersskip_audit (bool) – if True, no auditing is performed.
- Return type:
- property generatePickle: bool
Boolean variable, True if the PICKLE file with the results must be generated.
- property generate_html
- property generate_pickle
- getBoundsOnBeta(beta_name)[source]
Warning
This function is deprecated. Use
get_bounds_on_beta()
instead.- Return type:
tuple
[float
,float
]- Parameters:
beta_name (str)
- get_beta_values()[source]
- Returns a dict with the initial values of Beta. Typically
useful for simulation.
- Returns:
dict with the initial values of the Beta
- Return type:
dict(str: float)
- get_bounds_on_beta(beta_name)[source]
Returns the bounds on the parameter as defined by the user.
- Parameters:
beta_name (string) – name of the parameter
- Returns:
lower bound, upper bound
- Return type:
tuple
- Raises:
BiogemeError – if the name of the parameter is not found.
- property identification_threshold
- property infeasible_cg
- initLogLike
Init value of the likelihood function
- property initial_radius
- property large_data_set
- property largest_neighborhood
- lastSample
keeps track of the sample of data used to calculate the stochastic gradient / hessian
- likelihoodFiniteDifferenceHessian(x)[source]
Warning
This function is deprecated. Use
likelihood_finite_difference_hessian()
instead.- Return type:
ndarray
- Parameters:
x (ndarray)
- likelihood_finite_difference_hessian(x)[source]
Calculate the hessian of the log likelihood function using finite differences.
May be useful when the analytical hessian has numerical issues.
- Parameters:
x (list(float)) – vector of values for the parameters.
- Returns:
finite differences approximation of the hessian.
- Return type:
numpy.array
- Raises:
ValueError – if the length of the list x is incorrect
-
log_like:
Expression
Object of type
biogeme.expressions.Expression
calculating the formula for the loglikelihood
-
log_like_name:
str
Keywords used for the name of the loglikelihood formula. Default: ‘log_like’
-
log_like_valid_names:
list
[str
]
- property loglike: Expression
For backward compatibility
-
loglikeSignatures:
list
[bytes
] Internal signature of the formula for the loglikelihood.
- property max_iterations
- property max_number_parameters_to_report
- property maximum_attempts
- property maximum_number_catalog_expressions
- property maximum_number_parameters
- property missing_data
- modelName
Name of the model. Default: ‘biogemeModelDefaultName’
- monte_carlo
monte_carlo
is True if one of the expressions involves a Monte-Carlo integration.
- nullLogLike
Log likelihood of the null model
- property numberOfDraws: int
Number of draws for Monte-Carlo integration.
- property numberOfThreads: int
Number of threads used for parallel computing. Default: the number of available CPU. Maintained for backward compatibility.
- property number_of_draws
- property number_of_neighbors
- property number_of_threads: int
Number of threads used for parallel computing. Default: the number of available CPU.
- number_unknown_parameters()[source]
Returns the number of parameters that must be estimated
- Returns:
number of parameters
- Return type:
int
- property only_robust_stats
- optimizationMessages
Information provided by the optimization algorithm after completion.
- property optimization_algorithm
- optimize(starting_values=None)[source]
Calls the optimization algorithm. The function self.algorithm is called.
- Parameters:
starting_values (list(float)) – starting point for the algorithm
- Returns:
x, messages
x is the solution generated by the algorithm,
messages is a dictionary describing several information about the algorithm
- Return type:
numpay.array, dict(str:object)
- Raises:
BiogemeError – an error is raised if no algorithm is specified.
-
parameter_file:
str
- properties_initialized = True
- quickEstimate(**kwargs)[source]
Warning
This function is deprecated. Use
quick_estimate()
instead.- Return type:
- quick_estimate(**kwargs)[source]
- Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)
- Returns:
object containing the estimation results.
- Return type:
Example:
# Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.quickEstimate()
- Raises:
BiogemeError – if no expression has been provided for the likelihood
- Return type:
- recycled_estimation(run_bootstrap=False, **kwargs)[source]
- Return type:
- Parameters:
run_bootstrap (bool)
- report_array(array, with_names=True)[source]
Reports the entries of the array up to the maximum number
- Parameters:
array (numpy.array) – array to report
with_names (bool) – if True, the names of the parameters are included
- Returns:
string reporting the values
- Return type:
str
- reset_id_manager()[source]
Reset all the ids of the elementary expression in the formulas
- Return type:
None
- property save_iterations
- property second_derivatives
- property seed
- setRandomInitValues(default_bound=100.0)[source]
Warning
This function is deprecated. Use
set_random_init_values()
instead.- Return type:
None
- Parameters:
default_bound (float)
- set_random_init_values(default_bound=100.0)[source]
Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds.
- Parameters:
default_bound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100.
- Return type:
None
-
short_names:
ModelNames
|None
- simulate(the_beta_values)[source]
Applies the formulas to each row of the database.
- Parameters:
the_beta_values (dict(str, float)) – values of the parameters to be used in the calculations. If None, the default values are used. Default: None.
- Returns:
a pandas data frame with the simulated value. Each row corresponds to a row in the database, and each column to a formula.
- Return type:
Pandas data frame
Example:
# Read the estimation results from a file results = res.bioResults(pickle_file = 'myModel.pickle') # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(beta_values)
- Raises:
BiogemeError – if the number of parameters is incorrect
BiogemeError – if theBetaValues is None.
- Parameters:
the_beta_values (dict[str, float] | None)
- Return type:
DataFrame
- property steptol
- property tolerance
- user_notes
User notes
- validate(estimation_results, validation_data)[source]
Perform out-of-sample validation.
The function performs the following tasks:
each slice defines a validation set (the slice itself) and an estimation set (the rest of the data),
the model is re-estimated on the estimation set,
the estimated model is applied on the validation set,
the value of the log likelihood for each observation is reported.
- Parameters:
estimation_results (bioResults) – results of the model estimation based on the full data.
validation_data (list(tuple(pandas.DataFrame, pandas.DataFrame))) – list of estimation and validation data sets
- Returns:
a list containing as many items as slices. Each item is the result of the simulation on the validation set.
- Return type:
list(pandas.DataFrame)
- Raises:
BiogemeError – An error is raised if the database is structured as panel data.
- property version
- weight
Object of type
biogeme.expressions.Expression
calculating the weight of each observation in the sample.
-
weightSignatures:
list
[bytes
] Internal signature of the formula for the weight.
- weight_name
Keyword used for the name of the weight formula. Default: ‘weight’