Simulation of a choice model

We use an estimated model to perform various simulations.

author:: Michel Bierlaire, EPFL
date:: Wed Apr 12 21:04:33 2023

import sys
import time
import pandas as pd
from biogeme import models
import biogeme.biogeme as bio
import biogeme.exceptions as excep
import biogeme.results as res
from biogeme.data.optima import read_data, normalized_weight
from scenarios import scenario

Obtain the specification for the default scenario. The definition of the scenarios is available in Specification of a nested logit model.

V, nests, _, _ = scenario()

V_PT = V[0]
V_CAR = V[1]
V_SM = V[2]

Obtain the expression for the choice probability of each alternative.

prob_PT = models.nested(V, None, nests, 0)
prob_CAR = models.nested(V, None, nests, 1)
prob_SM = models.nested(V, None, nests, 2)

# Read the estimation results from the file
try:
    results = res.bioResults(pickle_file='saved_results/b02estimation.pickle')
except excep.BiogemeError:
    sys.exit(
        'Run first the script b02simulation.py '
        'in order to generate the '
        'file b02estimation.pickle.'
    )

Read the database

database = read_data()

We now simulate various expressions on the database, and store the results in a Pandas dataframe. %%

start_time = time.time()
simulate_formulas = {
    'weight': normalized_weight.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Utility PT': V_PT.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Utility car': V_CAR.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Utility SM': V_SM.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Prob. PT': prob_PT.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Prob. car': prob_CAR.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
    'Prob. SM': prob_SM.get_value_c(
        betas=results.get_beta_values(), database=database, prepare_ids=True
    ),
}

simulated_values = pd.DataFrame.from_dict(
    simulate_formulas,
)

print(
    f'--- Execution time with getValue_c: '
    f'{time.time() - start_time:.2f} seconds ---'
)

--- Execution time with getValue_c: 0.47 seconds ---

We now perform the same simulation using Biogeme. The results are identical, but the execution time is faster. Indeed, Biogeme recycles calculations performed for one expression for the other expressions.

A dictionary with the requested expression must be provided to Biogeme

simulate = {
    'weight': normalized_weight,
    'Utility PT': V_PT,
    'Utility car': V_CAR,
    'Utility SM': V_SM,
    'Prob. PT': prob_PT,
    'Prob. car': prob_CAR,
    'Prob. SM': prob_SM,
}

start_time = time.time()
the_biogeme = bio.BIOGEME(database, simulate)
biogeme_simulation = the_biogeme.simulate(results.get_beta_values())

print(
    f'--- Execution time with Biogeme:    '
    f'{time.time() - start_time:.2f} seconds ---'
)

--- Execution time with Biogeme:    0.36 seconds ---

Let’s print the two results, to show that they are identical

Without Biogeme

print(simulated_values)

        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
   0.886023   -0.180894    -0.147283  ...  0.490700   0.508061  0.001239
   0.861136   -0.415315     0.221097  ...  0.228886   0.574899  0.196216
   0.861136   -2.154157    -0.035018  ...  0.102446   0.888093  0.009460
   0.957386   -2.181930     0.055095  ...  0.039047   0.790872  0.170080
   0.861136   -1.017444     0.025147  ...  0.252618   0.733898  0.013483
...        ...         ...          ...  ...       ...        ...       ...
2.036009   -1.149760    -0.250998  ...  0.289274   0.710676  0.000051
0.861136   -2.117915    -0.378399  ...  0.148251   0.849664  0.002085
0.861136   -0.973017     0.093820  ...  0.187894   0.688689  0.123416
0.957386   -1.166184     0.030280  ...  0.205622   0.747552  0.046825
0.957386   -1.309326    -0.031210  ...  0.198499   0.767021  0.034480

[1906 rows x 7 columns]

With Biogeme

print(biogeme_simulation)

        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
   0.886023   -0.180894    -0.147283  ...  0.490700   0.508061  0.001239
   0.861136   -0.415315     0.221097  ...  0.228886   0.574899  0.196216
   0.861136   -2.154157    -0.035018  ...  0.102446   0.888093  0.009460
   0.957386   -2.181930     0.055095  ...  0.039047   0.790872  0.170080
   0.861136   -1.017444     0.025147  ...  0.252618   0.733898  0.013483
...        ...         ...          ...  ...       ...        ...       ...
2.036009   -1.149760    -0.250998  ...  0.289274   0.710676  0.000051
0.861136   -2.117915    -0.378399  ...  0.148251   0.849664  0.002085
0.861136   -0.973017     0.093820  ...  0.187894   0.688689  0.123416
0.957386   -1.166184     0.030280  ...  0.205622   0.747552  0.046825
0.957386   -1.309326    -0.031210  ...  0.198499   0.767021  0.034480

[1906 rows x 7 columns]

Total running time of the script: (0 minutes 1.176 seconds)

Gallery generated by Sphinx-Gallery