biogeme.biogeme

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:

Michel Bierlaire

date:

Thu Nov 16 18:36:35 2023

import biogeme.version as ver
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
from biogeme.expressions import Beta, Variable, exp
import biogeme.biogeme_logging as blog
from biogeme.function_output import BiogemeFunctionOutput

Version of Biogeme.

print(ver.get_text())
biogeme 3.2.14 [2024-08-05]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

Logger.

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Logger initialized')
Logger initialized

Definition of a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [1, 2, 3, 4, 5],
        'Variable2': [10, 20, 30, 40, 50],
        'Choice': [1, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
my_data = db.Database('test', df)

Data

my_data.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 1 10 1 0 1 0
1 1 0 2 20 2 1 1 1
2 1 1 3 30 3 1 1 1
3 2 0 4 40 1 1 1 1
4 2 1 5 50 2 1 1 1


Definition of various expressions.

Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dict_of_expressions = {'log_like': likelihood, 'beta1': beta1, 'simul': simul}

Creation of the BIOGEME object.

my_biogeme = bio.BIOGEME(my_data, dict_of_expressions)
my_biogeme.modelName = 'simple_example'
print(my_biogeme)
Default values of the Biogeme parameters are used.
File biogeme.toml has been created
simple_example: database [test]{'log_like': ((((-Beta('beta1', -1.0, -3, 3, 0)**2.0) * Variable1) - (exp((Beta('beta2', 2.0, -3, 10, 0) * Beta('beta1', -1.0, -3, 3, 0))) * Variable2)) - Beta('beta2', 2.0, -3, 10, 0)**4.0), 'beta1': Beta('beta1', -1.0, -3, 3, 0), 'simul': ((Beta('beta1', -1.0, -3, 3, 0) / Variable1) + (Beta('beta2', 2.0, -3, 10, 0) / Variable2))}

The data is stored in the Biogeme object.

my_biogeme.database.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 1 10 1 0 1 0
1 1 0 2 20 2 1 1 1
2 1 1 3 30 3 1 1 1
3 2 0 4 40 1 1 1 1
4 2 1 5 50 2 1 1 1


Log likelihood with the initial values of the parameters.

my_biogeme.calculate_init_likelihood()
-115.30029248549191

Calculate the log-likelihood with a different value of the parameters. We retieve the current value and add 1 to each of them.

x = my_biogeme.id_manager.free_betas_values
xplus = [v + 1 for v in x]
print(xplus)
[0.0, 3.0]
my_biogeme.calculate_likelihood(xplus, scaled=True)
-111.0

Calculate the log-likelihood function and its derivatives.

the_function_output: BiogemeFunctionOutput = (
    my_biogeme.calculate_likelihood_and_derivatives(
        xplus, scaled=True, hessian=True, bhhh=True
    )
)
print(f'f = {the_function_output.function}')
f = -111.0
print(f'g = {the_function_output.gradient}')
g = [ -90. -108.]
pd.DataFrame(the_function_output.hessian)
0 1
0 -276.0 -30.0
1 -30.0 -108.0


pd.DataFrame(the_function_output.bhhh)
0 1
0 9900.0 9720.0
1 9720.0 11664.0


Now the unscaled version.

the_output: BiogemeFunctionOutput = my_biogeme.calculate_likelihood_and_derivatives(
    xplus, scaled=False, hessian=True, bhhh=True
)
print(f'f = {the_output.function}')
f = -555.0
print(f'g = {the_output.gradient}')
g = [-450. -540.]
pd.DataFrame(the_output.hessian)
0 1
0 -1380.0 -150.0
1 -150.0 -540.0


pd.DataFrame(the_output.bhhh)
0 1
0 49500.0 48600.0
1 48600.0 58320.0


Calculate the hessian of the log likelihood function using finite difference.

fin_diff_hessian = my_biogeme.likelihood_finite_difference_hessian(xplus)
pd.DataFrame(fin_diff_hessian)
0 1
0 -1380.000202 -150.000000
1 -150.000045 -540.000054


Check numerically the derivatives’ implementation. The analytical derivatives are compared to the numerical derivatives obtains by finite differences.

f, g, h, gdiff, hdiff = my_biogeme.check_derivatives(xplus, verbose=True)
x               Gradient        FinDiff         Difference
beta1           -4.500000E+02   -4.500001E+02   +6.934970E-05
beta2           -5.400000E+02   -5.400001E+02   +8.087011E-05
Row             Col             Hessian FinDiff         Difference
beta1           beta1           -1.380000E+03   -1.380000E+03   +2.022890E-04
beta1           beta2           -1.500000E+02   -1.500000E+02   +2.425509E-10
beta2           beta1           -1.500000E+02   -1.500000E+02   +4.509602E-05
beta2           beta2           -5.400000E+02   -5.400001E+02   +5.396423E-05
print(f'f = {f}')
f = -555.0
print(f'g = {g}')
g = [-450. -540.]
pd.DataFrame(h)
0 1
0 -1380.0 -150.0
1 -150.0 -540.0


pd.DataFrame(gdiff)
# print(f'gdiff = {gdiff}')
0
0 0.000069
1 0.000081


pd.DataFrame(hdiff)
# print(f'hdiff = {hdiff}')
0 1
0 0.000202 2.425509e-10
1 0.000045 5.396423e-05


Estimation

Estimation of the parameters, with bootstrapping

my_biogeme.bootstrap_samples = 10
results = my_biogeme.estimate(run_bootstrap=True)
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example.iter
Parameter values restored from __simple_example.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0           -0.22             2.1      1.9e+02      0.019       10      1.2   ++
    1            -0.6             1.5           93      0.013    1e+02      1.2   ++
    2              -1             1.3           70       0.01    1e+03      1.2   ++
    3            -1.2             1.3           67     0.0039    1e+04      1.1   ++
    4            -1.2             1.3           67    7.2e-05    1e+04        1   ++
Re-estimate the model 10 times for bootstrapping

  0%|          | 0/10 [00:00<?, ?it/s]Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           67    1.1e-07        1        1
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           56     0.0034       10        1   ++
    1            -1.3             1.2           56    1.5e-06       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.1           48      0.012       10        1   ++
    1            -1.3             1.1           48    2.1e-05       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           63     0.0003       10        1   ++
    1            -1.3             1.2           63      1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           78     0.0016       10     0.99   ++
    1            -1.3             1.3           78    2.8e-07       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           71    0.00023       10        1   ++
    1            -1.3             1.3           71    5.8e-09       10        1   ++

100%|██████████| 10/10 [00:00<00:00, 422.80it/s]
Results saved in file simple_example.html
Results saved in file simple_example.pickle
results.get_estimated_parameters()
Value Rob. Std err Rob. t-test Rob. p-value
beta1 -1.272838 0.013630 -93.387994 0.0
beta2 1.248867 0.059077 21.139711 0.0


If the model has already been estimated, it is possible to recycle the estimation results. In that case, the other arguments are ignored, and the results are whatever is in the file.

recycled_results = my_biogeme.estimate(recycle=True, run_bootstrap=True)
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Estimation results read from simple_example.pickle. There is no guarantee that they correspond to the specified model.
print(recycled_results.short_summary())
Results for model simple_example
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Final log likelihood:           -67.0655
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
recycled_results.get_estimated_parameters()
Value Rob. Std err Rob. t-test Rob. p-value
beta1 -1.272838 0.013630 -93.387994 0.0
beta2 1.248867 0.059077 21.139711 0.0


Simulation

Simulate with the initial values for the parameters.

simulation_with_default_betas = my_biogeme.simulate(
    my_biogeme.log_like.get_beta_values()
)
simulation_with_default_betas
log_like beta1 simul
0 -20.733455 -1.272838 -0.229590
1 -17.073277 -1.272838 -0.286988
2 -17.073277 -1.272838 -0.286988
3 -9.752922 -1.272838 -0.573976
4 -6.092744 -1.272838 -1.147951


Simulate with the estimated values for the parameters.

print(results.get_beta_values())
{'beta1': np.float64(-1.272838017207035), 'beta2': np.float64(1.248867015967332)}
simulation_with_estimated_betas = my_biogeme.simulate(results.get_beta_values())
simulation_with_estimated_betas
log_like beta1 simul
0 -20.733455 -1.272838 -0.229590
1 -17.073277 -1.272838 -0.286988
2 -17.073277 -1.272838 -0.286988
3 -9.752922 -1.272838 -0.573976
4 -6.092744 -1.272838 -1.147951


Confidence intervals. First, we extract the values of betas from the bootstrapping draws.

draws_from_betas = results.get_betas_for_sensitivity_analysis(
    my_biogeme.id_manager.free_betas.names
)
for draw in draws_from_betas:
    print(draw)
{'beta1': np.float64(-1.2649797746053077), 'beta2': np.float64(1.284263176431638)}
{'beta1': np.float64(-1.2732639061775262), 'beta2': np.float64(1.2487688275262538)}
{'beta1': np.float64(-1.2649797746053077), 'beta2': np.float64(1.284263176431638)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.2295568141100899)}
{'beta1': np.float64(-1.2873325302586314), 'beta2': np.float64(1.1875198336493942)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.2295568141100899)}
{'beta1': np.float64(-1.2981043644725554), 'beta2': np.float64(1.1393509911926116)}
{'beta1': np.float64(-1.2777126115403008), 'beta2': np.float64(1.22955681411009)}
{'beta1': np.float64(-1.2611085824685073), 'beta2': np.float64(1.300751700054335)}
{'beta1': np.float64(-1.2690260405818319), 'beta2': np.float64(1.2669668838241481)}

Then, we calculate the confidence intervals. The default interval size is 0.9. Here, we use a different one.

left, right = my_biogeme.confidence_intervals(draws_from_betas, interval_size=0.95)
left
log_like beta1 simul
0 -21.416393 -1.295681 -0.236132
1 -17.483798 -1.295681 -0.295165
2 -17.483798 -1.295681 -0.295165
3 -9.907866 -1.295681 -0.590331
4 -6.369267 -1.295681 -1.180662


right
log_like beta1 simul
0 -20.523662 -1.26198 -0.226455
1 -16.985063 -1.26198 -0.283069
2 -16.985063 -1.26198 -0.283069
3 -9.618608 -1.26198 -0.566138
4 -5.686012 -1.26198 -1.132275


Validation

The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.

validationData = my_data.split(slices=5)
validation_results = my_biogeme.validate(results, validationData)
/Users/bierlair/venv312/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:57: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_1.iter
Cannot read file __simple_example_val_est_1.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           61     0.0012       10     0.99   ++
    1            -1.3             1.3           61    1.5e-07       10        1   ++
Results saved in file simple_example_val_est_1.html
Results saved in file simple_example_val_est_1.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_2.iter
Cannot read file __simple_example_val_est_2.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           54    1.1e-07        1        1
Results saved in file simple_example_val_est_2.html
Results saved in file simple_example_val_est_2.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_3.iter
Cannot read file __simple_example_val_est_3.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           46     0.0022       10        1   ++
    1            -1.3             1.2           46      6e-07       10        1   ++
Results saved in file simple_example_val_est_3.html
Results saved in file simple_example_val_est_3.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_4.iter
Cannot read file __simple_example_val_est_4.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           50    0.00048       10        1   ++
    1            -1.3             1.2           50    2.6e-08       10        1   ++
Results saved in file simple_example_val_est_4.html
Results saved in file simple_example_val_est_4.pickle
Biogeme parameters read from biogeme.toml.
Biogeme parameters read from biogeme.toml.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __simple_example_val_est_5.iter
Cannot read file __simple_example_val_est_5.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           57    0.00035       10        1   ++
    1            -1.3             1.3           57    1.3e-08       10        1   ++
Results saved in file simple_example_val_est_5.html
Results saved in file simple_example_val_est_5.pickle
Biogeme parameters read from biogeme.toml.
Simulation results saved in file simple_example_validation.pickle
for slide in validation_results:
    print(
        f'Log likelihood for {slide.shape[0]} '
        f'validation data: {slide["Loglikelihood"].sum()}'
    )
Log likelihood for 1 validation data: -6.341108764653339
Log likelihood for 1 validation data: -13.41309809589276
Log likelihood for 1 validation data: -21.037421356430013
Log likelihood for 1 validation data: -17.145326445763835
Log likelihood for 1 validation data: -9.817719764565792

The following tools is used to find .py with the model name and a specific extension.

my_biogeme.files_of_type('pickle')
['simple_example.pickle']

Total running time of the script: (0 minutes 0.161 seconds)

Gallery generated by Sphinx-Gallery