Note
Go to the end to download the full example code.
Configuring Biogeme with parameters¶
We illustrate how to obtain information about configuration parameters, and how to modify them.
Michel Bierlaire, EPFL Thu May 16 13:24:56 2024
import os
import pandas as pd
from IPython.core.display_functions import display
import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.database import Database
from biogeme.default_parameters import print_list_of_parameters
from biogeme.expressions import Beta
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Illustration of the definition of parameters')
Illustration of the definition of parameters
Biogeme accepts several parameters that modifies its functionalities. In this example, we illustrate how to obtain information about those parameters, and how to modify them.
We first create a dummy dataset that is needed to create the Biogeme object. Its content is irrelevant.
data = {
'x1': pd.Series([0]),
}
pandas_dataframe = pd.DataFrame(data)
biogeme_database = Database('dummy', pandas_dataframe)
We also create a dummy model, irrelevant as well.
logprob = Beta('dummy_parameter', 0, None, None, 0)
When you create the Biogeme model, Biogeme tries to read the values of the parameters from the file biogeme.toml. If the file does not exist, default values are used.
biogeme_object = BIOGEME(database=biogeme_database, formulas=logprob)
Biogeme parameters read from biogeme.toml.
For instance, let’s check the value for the maximum number of iterations:
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Max. number of iterations: 1000
If it did not exist before, Biogeme has created a file called biogeme.toml.
default_toml_file_name = 'biogeme.toml'
with open(default_toml_file_name, 'r') as file:
lines = file.readlines()
Here are the first lines of this file. As you see, it is structured into sections. Each section contains a list of parameters, their value, and a short description.
number_of_lines_to_display = 20
for i, line in enumerate(lines):
print(line, end='')
if i == number_of_lines_to_display:
break
# Default parameter file for Biogeme 3.3.1
# Automatically created on September 02, 2025. 21:33:27
[TrustRegion]
dogleg = "True" # bool: choice of the method to solve the trust region subproblem.
# True: dogleg. False: truncated conjugate gradient.
[Estimation]
bootstrap_samples = 100 # int: number of re-estimations for bootstrap sampling.
calculating_second_derivatives = "analytical" # Defines how to calculate the second
# derivatives:
# analytical,finite_differences,never.
#
large_data_set = 100000 # If the number of observations is larger than this
# value, the data set is deemed large, and the default
# estimation algorithm will not use second derivatives.
max_number_parameters_to_report = 15 # int: maximum number of parameters to
# report during the estimation.
save_iterations = "True" # bool: If True, the current iterate is saved after each
# iteration, in a file named ``__[modelName].iter``,
# where ``[modelName]`` is the name given to the model.
Let’s now replace the value of a parameter in the file by 500.
for i, line in enumerate(lines):
if 'max_iterations' in line:
lines[i] = 'max_iterations = 500\n'
with open(default_toml_file_name, 'w') as file:
file.writelines(lines)
We create a new Biogeme object. The values of the parameters are read from the file biogeme.toml.
biogeme_object = BIOGEME(database=biogeme_database, formulas=logprob)
Biogeme parameters read from biogeme.toml.
We check that the value 500 that we have specified has indeed been considered.
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Max. number of iterations: 500
It is possible to have several toml files, with different configurations. For instance, let’s create another file with a different value for the max_iterations parameter: 650.
another_toml_file_name = 'customized.toml'
new_value = 650
for i, line in enumerate(lines):
if 'max_iterations' in line:
lines[i] = f'max_iterations = {new_value}\n'
with open(another_toml_file_name, 'w') as file:
file.writelines(lines)
The name of the file must now be specified at the creation of the Biogeme object.
biogeme_object = BIOGEME(
database=biogeme_database, formulas=logprob, parameters=another_toml_file_name
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from customized.toml.
Max. number of iterations: 650
Note that if you specify the name of a file that does not exist, this file will be created, and the default value of the parameters used.
yet_another_toml_file_name = 'xxx.toml'
biogeme_object = BIOGEME(
database=biogeme_database, formulas=logprob, parameters=yet_another_toml_file_name
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Default values of the Biogeme parameters are used.
File xxx.toml has been created
Max. number of iterations: 1000
Another way to set the value of a parameter is to specify it explicitly at the creation of the Biogeme object. It supersedes the value in the .toml file.
yet_another_value = 234
biogeme_object = BIOGEME(
database=biogeme_database, formulas=logprob, max_iterations=yet_another_value
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from biogeme.toml.
Max. number of iterations: 234
Both can be combined.
biogeme_object = BIOGEME(
database=biogeme_database,
formulas=logprob,
max_iterations=234,
parameters=another_toml_file_name,
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from customized.toml.
Max. number of iterations: 234
We delete the toml files to clean the directory.
os.remove(default_toml_file_name)
os.remove(another_toml_file_name)
os.remove(yet_another_toml_file_name)
Finally, we display the list of all parameters
display(print_list_of_parameters())
Parameter Default value Type Section Description
identification_threshold 1e-05 <class 'float'> Output float: if the smallest eigenvalue of the second derivative matrix is lesser or equal to this parameter, the model is considered not identified. The corresponding eigenvector is then reported to identify the parameters involved in the issue.
only_robust_stats True <class 'bool'> Output bool: "True" if only the robust statistics need to be reported. If "False", the statistics from the Rao-Cramer bound are also reported.
generate_html True <class 'bool'> Output bool: "True" if the HTML file with the results must be generated.
generate_yaml True <class 'bool'> Output bool: "True" if the yaml file with the results must be generated.
save_validation_results True <class 'bool'> Output bool: "True" if the validation results are saved in CSV files.
number_of_threads 0 <class 'int'> MultiThreading int: Number of threads/processors to be used. If the parameter is 0, the number of available threads is calculated using cpu_count().
number_of_draws 10000 <class 'int'> MonteCarlo int: Number of draws for Monte-Carlo integration.
missing_data 99999 <class 'int'> Specification number: If one variable has this value, it is assumed that a data is missing and an exception will be triggered.
numerically_safe False <class 'bool'> Specification If true, Biogeme is doing its best to deal with numerical issues, such as division by a number close to zero, at the possible expense of speed.
use_jit True <class 'bool'> Specification If True, the model is compiled using jit (just-in-time) to speed up the calculation. For complex models, compilation time may exceed the gain due to compilation, so that it is worth turning it off.
seed 0 <class 'int'> MonteCarlo int: Seed used for the pseudo-random number generation. It is useful only when each run should generate the exact same result. If 0, a new seed is used at each run.
bootstrap_samples 100 <class 'int'> Estimation int: number of re-estimations for bootstrap sampling.
calculating_second_derivatives analytical <class 'str'> Estimation Defines how to calculate the second derivatives: analytical,finite_differences,never.
large_data_set 100000 <class 'int'> Estimation If the number of observations is larger than this value, the data set is deemed large, and the default estimation algorithm will not use second derivatives.
max_number_parameters_to_report 15 <class 'int'> Estimation int: maximum number of parameters to report during the estimation.
save_iterations True <class 'bool'> Estimation bool: If True, the current iterate is saved after each iteration, in a file named ``__[modelName].iter``, where ``[modelName]`` is the name given to the model. If such a file exists, the starting values for the estimation are replaced by the values saved in the file.
maximum_number_catalog_expressions 100 <class 'int'> Estimation If the expression contains catalogs, the parameter sets an upper bound of the total number of possible combinations that can be estimated in the same loop.
optimization_algorithm automatic <class 'str'> Estimation str: optimization algorithm to be used for estimation. Valid values: ['automatic', 'scipy', 'LS-newton', 'TR-newton', 'LS-BFGS', 'TR-BFGS', 'simple_bounds', 'simple_bounds_newton', 'simple_bounds_BFGS']
second_derivatives 1.0 <class 'float'> SimpleBounds float: proportion (between 0 and 1) of iterations when the analytical Hessian is calculated
tolerance 6.055454452393343e-06 <class 'float'> SimpleBounds float: the algorithm stops when this precision is reached
max_iterations 1000 <class 'int'> SimpleBounds int: maximum number of iterations
infeasible_cg False <class 'bool'> SimpleBounds If True, the conjugate gradient algorithm may generate infeasible solutions until termination. The result will then be projected on the feasible domain. If False, the algorithm stops as soon as an infeasible iterate is generated
initial_radius 1 <class 'float'> SimpleBounds Initial radius of the trust region
steptol 3.666852862501036e-11 <class 'float'> SimpleBounds The algorithm stops when the relative change in x is below this threshold. Basically, if p significant digits of x are needed, steptol should be set to 1.0e-p.
enlarging_factor 10 <class 'float'> SimpleBounds If an iteration is very successful, the radius of the trust region is multiplied by this factor
dogleg True <class 'bool'> TrustRegion bool: choice of the method to solve the trust region subproblem. True: dogleg. False: truncated conjugate gradient.
maximum_number_parameters 50 <class 'int'> AssistedSpecification int: maximum number of parameters allowed in a model. Each specification with a higher number is deemed invalid and not estimated.
number_of_neighbors 20 <class 'int'> AssistedSpecification int: maximum number of neighbors that are visited by the VNS algorithm.
largest_neighborhood 20 <class 'int'> AssistedSpecification int: size of the largest neighborhood copnsidered by the Variable Neighborhood Search (VNS) algorithm.
maximum_attempts 100 <class 'int'> AssistedSpecification int: an attempts consists in selecting a solution in the Pareto set, and trying to improve it. The parameter imposes an upper bound on the total number of attempts, irrespectively if they are successful or not.
number_of_jobs 2 <class 'int'> Bootstrap int: The maximum number of concurrently running jobs. If -1 is given, joblib tries to use all CPUs.
version 3.3.1 <class 'str'> Biogeme Version of Biogeme that created the TOML file. Do not modify this value.
Total running time of the script: (0 minutes 0.075 seconds)