biogeme.biogeme

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:

Michel Bierlaire

date:

Thu Nov 16 18:36:35 2023

import biogeme.version as ver
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
from biogeme.expressions import Beta, Variable, exp
import biogeme.biogeme_logging as blog

Version of Biogeme.

print(ver.getText())
biogeme 3.2.13 [2023-12-23]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

Logger.

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Logger initalized')
Logger initalized

Definition of a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [1, 2, 3, 4, 5],
        'Variable2': [10, 20, 30, 40, 50],
        'Choice': [1, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
myData = db.Database('test', df)

Data

myData.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 1 10 1 0 1 0
1 1 0 2 20 2 1 1 1
2 1 1 3 30 3 1 1 1
3 2 0 4 40 1 1 1 1
4 2 1 5 50 2 1 1 1


Definition of various expressions.

Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike': likelihood, 'beta1': beta1, 'simul': simul}

Creation of the BIOGEME object.

myBiogeme = bio.BIOGEME(myData, dictOfExpressions)
myBiogeme.modelName = 'simple_example'
print(myBiogeme)
File biogeme.toml has been created
simple_example: database [test]{'loglike': ((((-(Beta('beta1', -1.0, -3, 3, 0) ** `2.0`)) * Variable1) - (exp((Beta('beta2', 2.0, -3, 10, 0) * Beta('beta1', -1.0, -3, 3, 0))) * Variable2)) - (Beta('beta2', 2.0, -3, 10, 0) ** `4.0`)), 'beta1': Beta('beta1', -1.0, -3, 3, 0), 'simul': ((Beta('beta1', -1.0, -3, 3, 0) / Variable1) + (Beta('beta2', 2.0, -3, 10, 0) / Variable2))}

The data is stored in the Biogeme object.

myBiogeme.database.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 1 10 1 0 1 0
1 1 0 2 20 2 1 1 1
2 1 1 3 30 3 1 1 1
3 2 0 4 40 1 1 1 1
4 2 1 5 50 2 1 1 1


Log likelihood with the initial values of the parameters.

myBiogeme.calculateInitLikelihood()
-115.30029248549191

Calculate the log-likelihood with a different value of the parameters. We retieve the current value and add 1 to each of them.

x = myBiogeme.id_manager.free_betas_values
xplus = [v + 1 for v in x]
print(xplus)
[0.0, 3.0]
myBiogeme.calculateLikelihood(xplus, scaled=True)
-111.0

Calculate the log-likelihood function and its derivatives.

f, g, h, bhhh = myBiogeme.calculateLikelihoodAndDerivatives(
    xplus, scaled=True, hessian=True, bhhh=True
)
print(f'f = {f}')
f = -111.0
print(f'g = {g}')
g = [ -90. -108.]
pd.DataFrame(h)
0 1
0 -270.0 -30.0
1 -30.0 -108.0


pd.DataFrame(bhhh)
0 1
0 9900.0 9720.0
1 9720.0 11664.0


Now the unscaled version.

f, g, h, bhhh = myBiogeme.calculateLikelihoodAndDerivatives(
    xplus, scaled=False, hessian=True, bhhh=True
)
print(f'f = {f}')
f = -555.0
print(f'g = {g}')
g = [-450. -540.]
pd.DataFrame(h)
0 1
0 -1350.0 -150.0
1 -150.0 -540.0


pd.DataFrame(bhhh)
0 1
0 49500.0 48600.0
1 48600.0 58320.0


Calculate the hessian of the log likelihood function using finite difference.

fin_diff_hessian = myBiogeme.likelihoodFiniteDifferenceHessian(xplus)
pd.DataFrame(fin_diff_hessian)
0 1
0 -1380.000202 -150.000000
1 -150.000045 -540.000054


Check numerically the derivatives implementation. The analytical derivatives are compared to the numerical derivatives obtains by finite differences.

f, g, h, gdiff, hdiff = myBiogeme.checkDerivatives(xplus, verbose=True)
x               Gradient        FinDiff         Difference
beta1           -4.500000E+02   -4.500001E+02   +6.934970E-05
beta2           -5.400000E+02   -5.400001E+02   +8.087011E-05
Row             Col             Hessian FinDiff         Difference
beta1           beta1           -1.350000E+03   -1.380000E+03   +3.000020E+01
beta1           beta2           -1.500000E+02   -1.500000E+02   +2.425509E-10
beta2           beta1           -1.500000E+02   -1.500000E+02   +4.509602E-05
beta2           beta2           -5.400000E+02   -5.400001E+02   +5.396423E-05
print(f'f = {f}')
f = -555.0
print(f'g = {g}')
g = [-450. -540.]
pd.DataFrame(h)
0 1
0 -1350.0 -150.0
1 -150.0 -540.0


pd.DataFrame(gdiff)
# print(f'gdiff = {gdiff}')
0
0 0.000069
1 0.000081


pd.DataFrame(hdiff)
# print(f'hdiff = {hdiff}')
0 1
0 30.000202 2.425509e-10
1 0.000045 5.396423e-05


Estimation

Estimation of the parameters, with bootstrapping

myBiogeme.bootstrap_samples = 10
results = myBiogeme.estimate(run_bootstrap=True)
*** Initial values of the parameters are obtained from the file __simple_example.iter
Parameter values restored from __simple_example.iter
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0           -0.23             2.1      1.8e+02      0.019       10      1.2   ++
    1            -0.6             1.5           93      0.013    1e+02      1.2   ++
    2              -1             1.3           70       0.01    1e+03      1.2   ++
    3            -1.2             1.3           67     0.0039    1e+04      1.1   ++
    4            -1.3             1.2           67    6.8e-05    1e+05        1   ++
    5            -1.3             1.2           67    1.2e-08    1e+05        1   ++
Re-estimate the model 10 times for bootstrapping

  0%|          | 0/10 [00:00<?, ?it/s]Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.1           44      0.021       10        1   ++
    1            -1.3             1.1           44    5.9e-05    1e+02        1   ++
    2            -1.3             1.1           44    4.1e-08    1e+02        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           52     0.0069       10        1   ++
    1            -1.3             1.2           52    6.2e-06    1e+02        1   ++
    2            -1.3             1.2           52    1.3e-07    1e+02        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           56     0.0034       10        1   ++
    1            -1.3             1.2           56    1.4e-06       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           71    0.00023       10        1   ++
    1            -1.3             1.3           71    5.8e-09       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           52     0.0069       10        1   ++
    1            -1.3             1.2           52    6.2e-06    1e+02        1   ++
    2            -1.3             1.2           52    1.3e-07    1e+02        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           71    0.00023       10        1   ++
    1            -1.3             1.3           71    5.8e-09       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           74    0.00082       10     0.99   ++
    1            -1.3             1.3           74    7.1e-08       10        1   ++
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           81     0.0026       10     0.99   ++
    1            -1.3             1.3           81    6.8e-07       10        1   ++

100%|██████████| 10/10 [00:00<00:00, 266.42it/s]
Results saved in file simple_example.html
Results saved in file simple_example.pickle
results.getEstimatedParameters()
Value Rob. Std err Rob. t-test Rob. p-value
beta1 -1.273264 0.013724 -92.776769 0.0
beta2 1.248769 0.059086 21.134795 0.0


If the model has already been estimated, it is possible to recycle the estimation results. In that case, the other arguments are ignored, and the results are whatever is in the file.

recycled_results = myBiogeme.estimate(recycle=True, run_bootstrap=True)
Estimation results read from simple_example.pickle. There is no guarantee that they correspond to the specified model.
print(recycled_results.short_summary())
Results for model simple_example
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Final log likelihood:           -67.06549
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
recycled_results.getEstimatedParameters()
Value Rob. Std err Rob. t-test Rob. p-value
beta1 -1.273264 0.013724 -92.776769 0.0
beta2 1.248769 0.059086 21.134795 0.0


Simulation

Simulate with the initial values for the parameters.

simulation_with_default_betas = myBiogeme.simulate(myBiogeme.loglike.get_beta_values())
simulation_with_default_betas
loglike beta1 simul
0 -101.0 0.0 0.15
1 -131.0 0.0 0.06
2 -131.0 0.0 0.06
3 -131.0 0.0 0.06
4 -101.0 0.0 0.15


Simulate with the estimated values for the parameters.

print(results.getBetaValues())
{'beta1': -1.273263915009374, 'beta2': 1.248768825523196}
simulation_with_estimated_betas = myBiogeme.simulate(results.getBetaValues())
simulation_with_estimated_betas
loglike beta1 simul
0 -9.752666 -1.273264 -0.574194
1 -20.733962 -1.273264 -0.229677
2 -20.733962 -1.273264 -0.229677
3 -20.733962 -1.273264 -0.229677
4 -9.752666 -1.273264 -0.574194


Confidence intervals. First, we extract the values of betas from the bootstrapping draws.

draws_from_betas = results.getBetasForSensitivityAnalysis(
    myBiogeme.id_manager.free_betas.names
)
for draw in draws_from_betas:
    print(draw)
{'beta1': -1.304007541668053, 'beta2': 1.1122455742294828}
{'beta1': -1.264979774201378, 'beta2': 1.2842631765105155}
{'beta1': -1.292557821467689, 'beta2': 1.1643222175104226}
{'beta1': -1.2873325336195227, 'beta2': 1.1875198317201634}
{'beta1': -1.2690260405244749, 'beta2': 1.26696688383924}
{'beta1': -1.292557821467689, 'beta2': 1.1643222175104226}
{'beta1': -1.273263915009374, 'beta2': 1.248768825523196}
{'beta1': -1.2690260405244749, 'beta2': 1.26696688383924}
{'beta1': -1.264979774201378, 'beta2': 1.2842631765105155}
{'beta1': -1.2573978799517176, 'beta2': 1.3165120810083317}

Then, we calculate the confidence intervals. The default interval size is 0.9. Here, we use a different one.

left, right = myBiogeme.confidenceIntervals(draws_from_betas, interval_size=0.95)
left
loglike beta1 simul
0 -9.958158 -1.301431 -0.594518
1 -21.652272 -1.301431 -0.237807
2 -21.652272 -1.301431 -0.237807
3 -21.652272 -1.301431 -0.237807
4 -9.958158 -1.301431 -0.594518


right
loglike beta1 simul
0 -9.619739 -1.259104 -0.564089
1 -20.485149 -1.259104 -0.225636
2 -20.485149 -1.259104 -0.225636
3 -20.485149 -1.259104 -0.225636
4 -9.619739 -1.259104 -0.564089


Validation

The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.

validationData = myData.split(slices=5)
validation_results = myBiogeme.validate(results, validationData)
/Users/bierlair/venv312/lib/python3.12/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
File biogeme.toml has been parsed.
*** Initial values of the parameters are obtained from the file __simple_example_val_est_1.iter
Cannot read file __simple_example_val_est_1.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           50    0.00048       10        1   ++
    1            -1.3             1.2           50    2.6e-08       10        1   ++
Results saved in file simple_example_val_est_1.html
Results saved in file simple_example_val_est_1.pickle
File biogeme.toml has been parsed.
File biogeme.toml has been parsed.
*** Initial values of the parameters are obtained from the file __simple_example_val_est_2.iter
Cannot read file __simple_example_val_est_2.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Results saved in file simple_example_val_est_2.html
Results saved in file simple_example_val_est_2.pickle
File biogeme.toml has been parsed.
File biogeme.toml has been parsed.
*** Initial values of the parameters are obtained from the file __simple_example_val_est_3.iter
Cannot read file __simple_example_val_est_3.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           57    0.00035       10        1   ++
    1            -1.3             1.3           57    1.3e-08       10        1   ++
Results saved in file simple_example_val_est_3.html
Results saved in file simple_example_val_est_3.pickle
File biogeme.toml has been parsed.
File biogeme.toml has been parsed.
*** Initial values of the parameters are obtained from the file __simple_example_val_est_4.iter
Cannot read file __simple_example_val_est_4.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.3           61     0.0012       10     0.99   ++
    1            -1.3             1.3           61    1.5e-07       10        1   ++
Results saved in file simple_example_val_est_4.html
Results saved in file simple_example_val_est_4.pickle
File biogeme.toml has been parsed.
File biogeme.toml has been parsed.
*** Initial values of the parameters are obtained from the file __simple_example_val_est_5.iter
Cannot read file __simple_example_val_est_5.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.           beta1           beta2     Function    Relgrad   Radius      Rho
    0            -1.3             1.2           46     0.0022       10        1   ++
    1            -1.3             1.2           46      6e-07       10        1   ++
Results saved in file simple_example_val_est_5.html
Results saved in file simple_example_val_est_5.pickle
File biogeme.toml has been parsed.
Simulation results saved in file simple_example_validation.pickle
for slide in validation_results:
    print(
        f'Log likelihood for {slide.shape[0]} '
        f'validation data: {slide["Loglikelihood"].sum()}'
    )
Log likelihood for 1 validation data: -17.145326446024075
Log likelihood for 1 validation data: -13.413098095892746
Log likelihood for 1 validation data: -9.81771976465043
Log likelihood for 1 validation data: -6.341108765392212
Log likelihood for 1 validation data: -21.03742136293277

The following tools is used to find files with the model name and a specific extension.

myBiogeme.files_of_type('pickle')
['simple_example.pickle']

Total running time of the script: (0 minutes 0.232 seconds)

Gallery generated by Sphinx-Gallery