Note

Go to the end to download the full example code.

biogeme.biogeme

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:: Michel Bierlaire
date:: Thu Nov 16 18:36:35 2023

import biogeme.version as ver
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
from biogeme.expressions import Beta, Variable, exp
import biogeme.biogeme_logging as blog
from biogeme.function_output import BiogemeFunctionOutput

Version of Biogeme.

print(ver.get_text())

biogeme 3.2.14 [2024-08-05]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

Logger.

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Logger initialized')

Logger initialized

Definition of a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [1, 2, 3, 4, 5],
        'Variable2': [10, 20, 30, 40, 50],
        'Choice': [1, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
my_data = db.Database('test', df)

Data

my_data.data

	Person	Exclude	Variable1	Variable2	Choice	Av1	Av2	Av3
0	1	0	1	10	1	0	1	0
1	1	0	2	20	2	1	1	1
2	1	1	3	30	3	1	1	1
3	2	0	4	40	1	1	1	1
4	2	1	5	50	2	1	1	1

Definition of various expressions.

Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dict_of_expressions = {'log_like': likelihood, 'beta1': beta1, 'simul': simul}

Creation of the BIOGEME object.

my_biogeme = bio.BIOGEME(my_data, dict_of_expressions)
my_biogeme.modelName = 'simple_example'
print(my_biogeme)

Default values of the Biogeme parameters are used.
File biogeme.toml has been created
simple_example: database [test]{'log_like': ((((-Beta('beta1', -1.0, -3, 3, 0)**2.0) * Variable1) - (exp((Beta('beta2', 2.0, -3, 10, 0) * Beta('beta1', -1.0, -3, 3, 0))) * Variable2)) - Beta('beta2', 2.0, -3, 10, 0)**4.0), 'beta1': Beta('beta1', -1.0, -3, 3, 0), 'simul': ((Beta('beta1', -1.0, -3, 3, 0) / Variable1) + (Beta('beta2', 2.0, -3, 10, 0) / Variable2))}

The data is stored in the Biogeme object.

my_biogeme.database.data

	Person	Exclude	Variable1	Variable2	Choice	Av1	Av2	Av3
0	1	0	1	10	1	0	1	0
1	1	0	2	20	2	1	1	1
2	1	1	3	30	3	1	1	1
3	2	0	4	40	1	1	1	1
4	2	1	5	50	2	1	1	1

Log likelihood with the initial values of the parameters.

my_biogeme.calculate_init_likelihood()

-115.30029248549191

Calculate the log-likelihood with a different value of the parameters. We retieve the current value and add 1 to each of them.

x = my_biogeme.id_manager.free_betas_values
xplus = [v + 1 for v in x]
print(xplus)

[0.0, 3.0]

my_biogeme.calculate_likelihood(xplus, scaled=True)

-111.0

Calculate the log-likelihood function and its derivatives.

the_function_output: BiogemeFunctionOutput = (
    my_biogeme.calculate_likelihood_and_derivatives(
        xplus, scaled=True, hessian=True, bhhh=True
    )
)

print(f'f = {the_function_output.function}')

f = -111.0

print(f'g = {the_function_output.gradient}')

g = [ -90. -108.]

pd.DataFrame(the_function_output.hessian)

	0	1
0	-276.0	-30.0
1	-30.0	-108.0

pd.DataFrame(the_function_output.bhhh)

	0	1
0	9900.0	9720.0
1	9720.0	11664.0

Now the unscaled version.

the_output: BiogemeFunctionOutput = my_biogeme.calculate_likelihood_and_derivatives(
    xplus, scaled=False, hessian=True, bhhh=True
)

print(f'f = {the_output.function}')

f = -555.0

print(f'g = {the_output.gradient}')

g = [-450. -540.]

pd.DataFrame(the_output.hessian)

	0	1
0	-1380.0	-150.0
1	-150.0	-540.0

pd.DataFrame(the_output.bhhh)

	0	1
0	49500.0	48600.0
1	48600.0	58320.0

Calculate the hessian of the log likelihood function using finite difference.

fin_diff_hessian = my_biogeme.likelihood_finite_difference_hessian(xplus)
pd.DataFrame(fin_diff_hessian)

	0	1
0	-1380.000202	-150.000000
1	-150.000045	-540.000054

Check numerically the derivatives’ implementation. The analytical derivatives are compared to the numerical derivatives obtains by finite differences.

f, g, h, gdiff, hdiff = my_biogeme.check_derivatives(xplus, verbose=True)

x               Gradient        FinDiff         Difference
beta1           -4.500000E+02   -4.500001E+02   +6.934970E-05
beta2           -5.400000E+02   -5.400001E+02   +8.087011E-05
Row             Col             Hessian FinDiff         Difference
beta1           beta1           -1.380000E+03   -1.380000E+03   +2.022890E-04
beta1           beta2           -1.500000E+02   -1.500000E+02   +2.425509E-10
beta2           beta1           -1.500000E+02   -1.500000E+02   +4.509602E-05
beta2           beta2           -5.400000E+02   -5.400001E+02   +5.396423E-05

print(f'f = {f}')

f = -555.0

print(f'g = {g}')

g = [-450. -540.]

pd.DataFrame(h)

	0	1
0	-1380.0	-150.0
1	-150.0	-540.0

pd.DataFrame(gdiff)
# print(f'gdiff = {gdiff}')

	0
0	0.000069
1	0.000081

pd.DataFrame(hdiff)
# print(f'hdiff = {hdiff}')

	0	1
0	0.000202	2.425509e-10
1	0.000045	5.396423e-05

Estimation

Estimation of the parameters, with bootstrapping

my_biogeme.bootstrap_samples = 10
results = my_biogeme.estimate(run_bootstrap=True)

As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example.iter
Parameter values restored from __simple_example.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0           -0.22             2.1      1.9e+02      0.019       10      1.2   ++
    1            -0.6             1.5           93      0.013    1e+02      1.2   ++
    2              -1             1.3           70       0.01    1e+03      1.2   ++
    3            -1.2             1.3           67     0.0039    1e+04      1.1   ++
    4            -1.2             1.3           67    7.2e-05    1e+04        1   ++
Re-estimate the model 10 times for bootstrapping

  0%|          | 0/10 [00:00<?, ?it/s]Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           67    1.1e-07        1        1
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           56     0.0034       10        1   ++
    1            -1.3             1.2           56    1.5e-06       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.1           48      0.012       10        1   ++
    1            -1.3             1.1           48    2.1e-05       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           78     0.0016       10     0.99   ++
    1            -1.3             1.3           78    2.8e-07       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           71    0.00023       10        1   ++
    1            -1.3             1.3           71    5.8e-09       10        1   ++

100%|██████████| 10/10 [00:00<00:00, 422.80it/s]
Results saved in file simple_example.html
Results saved in file simple_example.pickle

results.get_estimated_parameters()

	Value	Rob. Std err	Rob. t-test	Rob. p-value
beta1	-1.272838	0.013630	-93.387994	0.0
beta2	1.248867	0.059077	21.139711	0.0

If the model has already been estimated, it is possible to recycle the estimation results. In that case, the other arguments are ignored, and the results are whatever is in the file.

recycled_results = my_biogeme.estimate(recycle=True, run_bootstrap=True)

As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Estimation results read from simple_example.pickle. There is no guarantee that they correspond to the specified model.

print(recycled_results.short_summary())

Results for model simple_example
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Final log likelihood:           -67.0655
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499

recycled_results.get_estimated_parameters()

	Value	Rob. Std err	Rob. t-test	Rob. p-value
beta1	-1.272838	0.013630	-93.387994	0.0
beta2	1.248867	0.059077	21.139711	0.0

Simulation

Simulate with the initial values for the parameters.

simulation_with_default_betas = my_biogeme.simulate(
    my_biogeme.log_like.get_beta_values()
)
simulation_with_default_betas

	log_like	beta1	simul
0	-20.733455	-1.272838	-0.229590
1	-17.073277	-1.272838	-0.286988
2	-17.073277	-1.272838	-0.286988
3	-9.752922	-1.272838	-0.573976
4	-6.092744	-1.272838	-1.147951

Simulate with the estimated values for the parameters.

print(results.get_beta_values())

{'beta1': np.float64(-1.272838017207035), 'beta2': np.float64(1.248867015967332)}

simulation_with_estimated_betas = my_biogeme.simulate(results.get_beta_values())
simulation_with_estimated_betas

	log_like	beta1	simul
0	-20.733455	-1.272838	-0.229590
1	-17.073277	-1.272838	-0.286988
2	-17.073277	-1.272838	-0.286988
3	-9.752922	-1.272838	-0.573976
4	-6.092744	-1.272838	-1.147951

Confidence intervals. First, we extract the values of betas from the bootstrapping draws.

draws_from_betas = results.get_betas_for_sensitivity_analysis(
    my_biogeme.id_manager.free_betas.names
)
for draw in draws_from_betas:
    print(draw)

{'beta1': np.float64(-1.2649797746053077), 'beta2': np.float64(1.284263176431638)}
{'beta1': np.float64(-1.2732639061775262), 'beta2': np.float64(1.2487688275262538)}
{'beta1': np.float64(-1.2649797746053077), 'beta2': np.float64(1.284263176431638)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.2295568141100899)}
{'beta1': np.float64(-1.2873325302586314), 'beta2': np.float64(1.1875198336493942)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.2295568141100899)}
{'beta1': np.float64(-1.2981043644725554), 'beta2': np.float64(1.1393509911926116)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.22955681411009)}
{'beta1': np.float64(-1.2611085824685073), 'beta2': np.float64(1.300751700054335)}
{'beta1': np.float64(-1.2690260405818319), 'beta2': np.float64(1.2669668838241481)}

Then, we calculate the confidence intervals. The default interval size is 0.9. Here, we use a different one.

left, right = my_biogeme.confidence_intervals(draws_from_betas, interval_size=0.95)
left

	log_like	beta1	simul
0	-21.416393	-1.295681	-0.236132
1	-17.483798	-1.295681	-0.295165
2	-17.483798	-1.295681	-0.295165
3	-9.907866	-1.295681	-0.590331
4	-6.369267	-1.295681	-1.180662

right

	log_like	beta1	simul
0	-20.523662	-1.26198	-0.226455
1	-16.985063	-1.26198	-0.283069
2	-16.985063	-1.26198	-0.283069
3	-9.618608	-1.26198	-0.566138
4	-5.686012	-1.26198	-1.132275

Validation

The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.

validationData = my_data.split(slices=5)
validation_results = my_biogeme.validate(results, validationData)

/Users/bierlair/venv312/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:57: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_1.iter
Cannot read file __simple_example_val_est_1.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           61     0.0012       10     0.99   ++
    1            -1.3             1.3           61    1.5e-07       10        1   ++
Results saved in file simple_example_val_est_1.html
Results saved in file simple_example_val_est_1.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_2.iter
Cannot read file __simple_example_val_est_2.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           54    1.1e-07        1        1
Results saved in file simple_example_val_est_2.html
Results saved in file simple_example_val_est_2.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_3.iter
Cannot read file __simple_example_val_est_3.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           46     0.0022       10        1   ++
    1            -1.3             1.2           46      6e-07       10        1   ++
Results saved in file simple_example_val_est_3.html
Results saved in file simple_example_val_est_3.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_4.iter
Cannot read file __simple_example_val_est_4.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           50    0.00048       10        1   ++
    1            -1.3             1.2           50    2.6e-08       10        1   ++
Results saved in file simple_example_val_est_4.html
Results saved in file simple_example_val_est_4.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_5.iter
Cannot read file __simple_example_val_est_5.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           57    0.00035       10        1   ++
    1            -1.3             1.3           57    1.3e-08       10        1   ++
Results saved in file simple_example_val_est_5.html
Results saved in file simple_example_val_est_5.pickle
Biogeme parameters read from biogeme.toml.
Simulation results saved in file simple_example_validation.pickle

for slide in validation_results:
    print(
        f'Log likelihood for {slide.shape[0]} '
        f'validation data: {slide["Loglikelihood"].sum()}'
    )

Log likelihood for 1 validation data: -6.341108764653339
Log likelihood for 1 validation data: -13.41309809589276
Log likelihood for 1 validation data: -21.037421356430013
Log likelihood for 1 validation data: -17.145326445763835
Log likelihood for 1 validation data: -9.817719764565792

The following tools is used to find .py with the model name and a specific extension.

my_biogeme.files_of_type('pickle')

['simple_example.pickle']

Total running time of the script: (0 minutes 0.161 seconds)

Gallery generated by Sphinx-Gallery