Configuring Biogeme with parameters

We illustrate how to obtain information about configuration parameters, and how to modify them.

Michel Bierlaire, EPFL Thu May 16 13:24:56 2024

import os

import pandas as pd
from IPython.core.display_functions import display

import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.database import Database
from biogeme.default_parameters import print_list_of_parameters
from biogeme.expressions import Beta

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Illustration of the definition of parameters')
Illustration of the definition of parameters

Biogeme accepts several parameters that modifies its functionalities. In this example, we illustrate how to obtain information about those parameters, and how to modify them.

We first create a dummy dataset that is needed to create the Biogeme object. Its content is irrelevant.

data = {
    'x1': pd.Series([0]),
}
pandas_dataframe = pd.DataFrame(data)
biogeme_database = Database('dummy', pandas_dataframe)

We also create a dummy model, irrelevant as well.

logprob = Beta('dummy_parameter', 0, None, None, 0)

When you create the Biogeme model, Biogeme tries to read the values of the parameters from the file biogeme.toml. If the file does not exist, default values are used.

biogeme_object = BIOGEME(database=biogeme_database, formulas=logprob)
Biogeme parameters read from biogeme.toml.

For instance, let’s check the value for the maximum number of iterations:

print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Max. number of iterations: 1000

If it did not exist before, Biogeme has created a file called biogeme.toml.

default_toml_file_name = 'biogeme.toml'
with open(default_toml_file_name, 'r') as file:
    lines = file.readlines()

Here are the first lines of this file. As you see, it is structured into sections. Each section contains a list of parameters, their value, and a short description.

number_of_lines_to_display = 20
for i, line in enumerate(lines):
    print(line, end='')
    if i == number_of_lines_to_display:
        break
# Default parameter file for Biogeme 3.3.1
# Automatically created on September 02, 2025. 21:33:27

[TrustRegion]
dogleg = "True" # bool: choice of the method to solve the trust region subproblem.
                # True: dogleg. False: truncated conjugate gradient.

[Estimation]
bootstrap_samples = 100 # int: number of re-estimations for bootstrap sampling.
calculating_second_derivatives = "analytical" # Defines how to calculate the second
                                              # derivatives:
                                              # analytical,finite_differences,never.
                                              #
large_data_set = 100000 # If the number of observations is larger than this
                        # value, the data set is deemed large, and the default
                        # estimation algorithm will not use second derivatives.
max_number_parameters_to_report = 15 # int: maximum number of parameters to
                                     # report during the estimation.
save_iterations = "True" # bool: If True, the current iterate is saved after each
                         # iteration, in a file named ``__[modelName].iter``,
                         # where ``[modelName]`` is the name given to the model.

Let’s now replace the value of a parameter in the file by 500.

for i, line in enumerate(lines):
    if 'max_iterations' in line:
        lines[i] = 'max_iterations = 500\n'

with open(default_toml_file_name, 'w') as file:
    file.writelines(lines)

We create a new Biogeme object. The values of the parameters are read from the file biogeme.toml.

biogeme_object = BIOGEME(database=biogeme_database, formulas=logprob)
Biogeme parameters read from biogeme.toml.

We check that the value 500 that we have specified has indeed been considered.

print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Max. number of iterations: 500

It is possible to have several toml files, with different configurations. For instance, let’s create another file with a different value for the max_iterations parameter: 650.

another_toml_file_name = 'customized.toml'
new_value = 650
for i, line in enumerate(lines):
    if 'max_iterations' in line:
        lines[i] = f'max_iterations = {new_value}\n'
with open(another_toml_file_name, 'w') as file:
    file.writelines(lines)

The name of the file must now be specified at the creation of the Biogeme object.

biogeme_object = BIOGEME(
    database=biogeme_database, formulas=logprob, parameters=another_toml_file_name
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from customized.toml.
Max. number of iterations: 650

Note that if you specify the name of a file that does not exist, this file will be created, and the default value of the parameters used.

yet_another_toml_file_name = 'xxx.toml'
biogeme_object = BIOGEME(
    database=biogeme_database, formulas=logprob, parameters=yet_another_toml_file_name
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Default values of the Biogeme parameters are used.
File xxx.toml has been created
Max. number of iterations: 1000

Another way to set the value of a parameter is to specify it explicitly at the creation of the Biogeme object. It supersedes the value in the .toml file.

yet_another_value = 234
biogeme_object = BIOGEME(
    database=biogeme_database, formulas=logprob, max_iterations=yet_another_value
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from biogeme.toml.
Max. number of iterations: 234

Both can be combined.

biogeme_object = BIOGEME(
    database=biogeme_database,
    formulas=logprob,
    max_iterations=234,
    parameters=another_toml_file_name,
)
print(f'Max. number of iterations: {biogeme_object.max_iterations}')
Biogeme parameters read from customized.toml.
Max. number of iterations: 234

We delete the toml files to clean the directory.

os.remove(default_toml_file_name)
os.remove(another_toml_file_name)
os.remove(yet_another_toml_file_name)

Finally, we display the list of all parameters

display(print_list_of_parameters())
Parameter                           Default value          Type             Section                Description
identification_threshold            1e-05                  <class 'float'>  Output                 float: if the smallest eigenvalue of the second derivative matrix is lesser or equal to this parameter, the model is considered not identified. The corresponding eigenvector is then reported to identify the parameters involved in the issue.
only_robust_stats                   True                   <class 'bool'>   Output                 bool: "True" if only the robust statistics need to be reported. If "False", the statistics from the Rao-Cramer bound are also reported.
generate_html                       True                   <class 'bool'>   Output                 bool: "True" if the HTML file with the results must be generated.
generate_yaml                       True                   <class 'bool'>   Output                 bool: "True" if the yaml file with the results must be generated.
save_validation_results             True                   <class 'bool'>   Output                 bool: "True" if the validation results are saved in CSV files.
number_of_threads                   0                      <class 'int'>    MultiThreading         int: Number of threads/processors to be used. If the parameter is 0, the number of available threads is calculated using cpu_count().
number_of_draws                     10000                  <class 'int'>    MonteCarlo             int: Number of draws for Monte-Carlo integration.
missing_data                        99999                  <class 'int'>    Specification          number: If one variable has this value, it is assumed that a data is missing and an exception will be triggered.
numerically_safe                    False                  <class 'bool'>   Specification          If true, Biogeme is doing its best to deal with numerical issues, such as division by a number close to zero, at the possible expense of speed.
use_jit                             True                   <class 'bool'>   Specification          If True, the model is compiled using jit (just-in-time) to speed up the calculation. For complex models, compilation time may exceed the gain due to compilation, so that it is worth turning it off.
seed                                0                      <class 'int'>    MonteCarlo             int: Seed used for the pseudo-random number generation. It is useful only when each run should generate the exact same result. If 0, a new seed is used at each run.
bootstrap_samples                   100                    <class 'int'>    Estimation             int: number of re-estimations for bootstrap sampling.
calculating_second_derivatives      analytical             <class 'str'>    Estimation             Defines how to calculate the second derivatives: analytical,finite_differences,never.
large_data_set                      100000                 <class 'int'>    Estimation             If the number of observations is larger than this value, the data set is deemed large, and the default estimation algorithm will not use second derivatives.
max_number_parameters_to_report     15                     <class 'int'>    Estimation             int: maximum number of parameters to report during the estimation.
save_iterations                     True                   <class 'bool'>   Estimation             bool: If True, the current iterate is saved after each iteration, in a file named ``__[modelName].iter``, where ``[modelName]`` is the name given to the model. If such a file exists, the starting values for the estimation are replaced by the values saved in the file.
maximum_number_catalog_expressions  100                    <class 'int'>    Estimation             If the expression contains catalogs, the parameter sets an upper bound of the total number of possible combinations that can be estimated in the same loop.
optimization_algorithm              automatic              <class 'str'>    Estimation             str: optimization algorithm to be used for estimation. Valid values: ['automatic', 'scipy', 'LS-newton', 'TR-newton', 'LS-BFGS', 'TR-BFGS', 'simple_bounds', 'simple_bounds_newton', 'simple_bounds_BFGS']
second_derivatives                  1.0                    <class 'float'>  SimpleBounds           float: proportion (between 0 and 1) of iterations when the analytical Hessian is calculated
tolerance                           6.055454452393343e-06  <class 'float'>  SimpleBounds           float: the algorithm stops when this precision is reached
max_iterations                      1000                   <class 'int'>    SimpleBounds           int: maximum number of iterations
infeasible_cg                       False                  <class 'bool'>   SimpleBounds           If True, the conjugate gradient algorithm may generate infeasible solutions until termination.  The result will then be projected on the feasible domain.  If False, the algorithm stops as soon as an infeasible iterate is generated
initial_radius                      1                      <class 'float'>  SimpleBounds           Initial radius of the trust region
steptol                             3.666852862501036e-11  <class 'float'>  SimpleBounds           The algorithm stops when the relative change in x is below this threshold. Basically, if p significant digits of x are needed, steptol should be set to 1.0e-p.
enlarging_factor                    10                     <class 'float'>  SimpleBounds           If an iteration is very successful, the radius of the trust region is multiplied by this factor
dogleg                              True                   <class 'bool'>   TrustRegion            bool: choice of the method to solve the trust region subproblem. True: dogleg. False: truncated conjugate gradient.
maximum_number_parameters           50                     <class 'int'>    AssistedSpecification  int: maximum number of parameters allowed in a model. Each specification with a higher number is deemed invalid and not estimated.
number_of_neighbors                 20                     <class 'int'>    AssistedSpecification  int: maximum number of neighbors that are visited by the VNS algorithm.
largest_neighborhood                20                     <class 'int'>    AssistedSpecification  int: size of the largest neighborhood copnsidered by the Variable Neighborhood Search (VNS) algorithm.
maximum_attempts                    100                    <class 'int'>    AssistedSpecification  int: an attempts consists in selecting a solution in the Pareto set, and trying to improve it. The parameter imposes an upper bound on the total number of attempts, irrespectively if they are successful or not.
number_of_jobs                      2                      <class 'int'>    Bootstrap              int: The maximum number of concurrently running jobs. If -1 is given, joblib tries to use all CPUs.
version                             3.3.1                  <class 'str'>    Biogeme                Version of Biogeme that created the TOML file. Do not modify this value.

Total running time of the script: (0 minutes 0.075 seconds)

Gallery generated by Sphinx-Gallery