Simulation of a choice model

We use an estimated model to perform various simulations.

author:

Michel Bierlaire, EPFL

date:

Wed Apr 12 21:04:33 2023

import sys
import time
import pandas as pd
from biogeme import models
import biogeme.biogeme as bio
import biogeme.exceptions as excep
import biogeme.results as res
from optima_data import database, normalized_weight
from scenarios import scenario

Obtain the specification for the default scenario. The definition of the scenarios is available in Specification of a nested logit model.

V, nests, _, _ = scenario()

V_PT = V[0]
V_CAR = V[1]
V_SM = V[2]

Obtain the expression for the choice probability of each alternative.

prob_PT = models.nested(V, None, nests, 0)
prob_CAR = models.nested(V, None, nests, 1)
prob_SM = models.nested(V, None, nests, 2)

# Read the estimation results from the file
try:
    results = res.bioResults(pickleFile='saved_results/b02estimation.pickle')
except excep.BiogemeError:
    sys.exit(
        'Run first the script b02simulation.py '
        'in order to generate the '
        'file b02estimation.pickle.'
    )

We now simulate various expressions on the database, and store the results in a Pandas dataframe.

start_time = time.time()
simulate_formulas = {
    'weight': normalized_weight.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Utility PT': V_PT.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Utility car': V_CAR.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Utility SM': V_SM.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Prob. PT': prob_PT.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Prob. car': prob_CAR.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
    'Prob. SM': prob_SM.getValue_c(
        betas=results.getBetaValues(), database=database, prepareIds=True
    ),
}
simulated_values = pd.DataFrame.from_dict(
    simulate_formulas,
)
print(
    f'--- Execution time with getValue_c: '
    f'{time.time() - start_time:.2f} seconds ---'
)
--- Execution time with getValue_c: 0.32 seconds ---

We now perform the same simulation using Biogeme. The results are identical, but the execution time is faster. Indeed, Biogeme recycles calculations performed for one expression for the other expressions.

A dictionary with the requested expression must be provided to Biogeme

simulate = {
    'weight': normalized_weight,
    'Utility PT': V_PT,
    'Utility car': V_CAR,
    'Utility SM': V_SM,
    'Prob. PT': prob_PT,
    'Prob. car': prob_CAR,
    'Prob. SM': prob_SM,
}
start_time = time.time()
the_biogeme = bio.BIOGEME(database, simulate)
biogeme_simulation = the_biogeme.simulate(results.getBetaValues())
print(
    f'--- Execution time with Biogeme:    '
    f'{time.time() - start_time:.2f} seconds ---'
)
--- Execution time with Biogeme:    0.42 seconds ---

Let’s print the two results, to show that they are identical

Without Biogeme

print(simulated_values)
        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
0     0.886023   -0.245379    -0.156238  ...  0.476807   0.521791  0.001402
1     0.861136   -0.451576     0.198134  ...  0.238681   0.563268  0.198051
2     0.861136   -2.027748    -0.047179  ...  0.119136   0.875824  0.005040
3     0.957386   -2.290720     0.030607  ...  0.051136   0.814110  0.134754
4     0.861136   -1.022414     0.009467  ...  0.257192   0.732059  0.010749
...        ...         ...          ...  ...       ...        ...       ...
1901  2.036009   -1.172087    -0.256681  ...  0.285872   0.714085  0.000043
1902  0.861136   -2.141291    -0.408666  ...  0.149688   0.849040  0.001272
1903  0.861136   -0.996681     0.068780  ...  0.205901   0.689083  0.105016
1904  0.957386   -1.157102     0.010095  ...  0.219850   0.744274  0.035876
1905  0.957386   -1.306555    -0.048721  ...  0.209149   0.765468  0.025383

[1906 rows x 7 columns]

With Biogeme

print(biogeme_simulation)
        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
0     0.886023   -0.245379    -0.156238  ...  0.476807   0.521791  0.001402
2     0.861136   -0.451576     0.198134  ...  0.238681   0.563268  0.198051
3     0.861136   -2.027748    -0.047179  ...  0.119136   0.875824  0.005040
4     0.957386   -2.290720     0.030607  ...  0.051136   0.814110  0.134754
5     0.861136   -1.022414     0.009467  ...  0.257192   0.732059  0.010749
...        ...         ...          ...  ...       ...        ...       ...
2259  2.036009   -1.172087    -0.256681  ...  0.285872   0.714085  0.000043
2261  0.861136   -2.141291    -0.408666  ...  0.149688   0.849040  0.001272
2262  0.861136   -0.996681     0.068780  ...  0.205901   0.689083  0.105016
2263  0.957386   -1.157102     0.010095  ...  0.219850   0.744274  0.035876
2264  0.957386   -1.306555    -0.048721  ...  0.209149   0.765468  0.025383

[1906 rows x 7 columns]

Total running time of the script: (0 minutes 0.777 seconds)

Gallery generated by Sphinx-Gallery