Note
Go to the end to download the full example code.
Simulation of a choice model
We use an estimated model to perform various simulations.
- author:
Michel Bierlaire, EPFL
- date:
Wed Apr 12 21:04:33 2023
import sys
import time
import pandas as pd
from biogeme import models
import biogeme.biogeme as bio
import biogeme.exceptions as excep
import biogeme.results as res
from biogeme.data.optima import read_data, normalized_weight
from scenarios import scenario
Obtain the specification for the default scenario. The definition of the scenarios is available in Specification of a nested logit model.
V, nests, _, _ = scenario()
V_PT = V[0]
V_CAR = V[1]
V_SM = V[2]
Obtain the expression for the choice probability of each alternative.
prob_PT = models.nested(V, None, nests, 0)
prob_CAR = models.nested(V, None, nests, 1)
prob_SM = models.nested(V, None, nests, 2)
# Read the estimation results from the file
try:
results = res.bioResults(pickle_file='saved_results/b02estimation.pickle')
except excep.BiogemeError:
sys.exit(
'Run first the script b02simulation.py '
'in order to generate the '
'file b02estimation.pickle.'
)
Read the database
database = read_data()
We now simulate various expressions on the database, and store the results in a Pandas dataframe. %%
start_time = time.time()
simulate_formulas = {
'weight': normalized_weight.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Utility PT': V_PT.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Utility car': V_CAR.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Utility SM': V_SM.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Prob. PT': prob_PT.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Prob. car': prob_CAR.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
'Prob. SM': prob_SM.get_value_c(
betas=results.get_beta_values(), database=database, prepare_ids=True
),
}
simulated_values = pd.DataFrame.from_dict(
simulate_formulas,
)
print(
f'--- Execution time with getValue_c: '
f'{time.time() - start_time:.2f} seconds ---'
)
--- Execution time with getValue_c: 0.47 seconds ---
We now perform the same simulation using Biogeme. The results are identical, but the execution time is faster. Indeed, Biogeme recycles calculations performed for one expression for the other expressions.
A dictionary with the requested expression must be provided to Biogeme
simulate = {
'weight': normalized_weight,
'Utility PT': V_PT,
'Utility car': V_CAR,
'Utility SM': V_SM,
'Prob. PT': prob_PT,
'Prob. car': prob_CAR,
'Prob. SM': prob_SM,
}
start_time = time.time()
the_biogeme = bio.BIOGEME(database, simulate)
biogeme_simulation = the_biogeme.simulate(results.get_beta_values())
print(
f'--- Execution time with Biogeme: '
f'{time.time() - start_time:.2f} seconds ---'
)
--- Execution time with Biogeme: 0.36 seconds ---
Let’s print the two results, to show that they are identical
Without Biogeme
print(simulated_values)
weight Utility PT Utility car ... Prob. PT Prob. car Prob. SM
0 0.886023 -0.180894 -0.147283 ... 0.490700 0.508061 0.001239
1 0.861136 -0.415315 0.221097 ... 0.228886 0.574899 0.196216
2 0.861136 -2.154157 -0.035018 ... 0.102446 0.888093 0.009460
3 0.957386 -2.181930 0.055095 ... 0.039047 0.790872 0.170080
4 0.861136 -1.017444 0.025147 ... 0.252618 0.733898 0.013483
... ... ... ... ... ... ... ...
1901 2.036009 -1.149760 -0.250998 ... 0.289274 0.710676 0.000051
1902 0.861136 -2.117915 -0.378399 ... 0.148251 0.849664 0.002085
1903 0.861136 -0.973017 0.093820 ... 0.187894 0.688689 0.123416
1904 0.957386 -1.166184 0.030280 ... 0.205622 0.747552 0.046825
1905 0.957386 -1.309326 -0.031210 ... 0.198499 0.767021 0.034480
[1906 rows x 7 columns]
With Biogeme
print(biogeme_simulation)
weight Utility PT Utility car ... Prob. PT Prob. car Prob. SM
0 0.886023 -0.180894 -0.147283 ... 0.490700 0.508061 0.001239
2 0.861136 -0.415315 0.221097 ... 0.228886 0.574899 0.196216
3 0.861136 -2.154157 -0.035018 ... 0.102446 0.888093 0.009460
4 0.957386 -2.181930 0.055095 ... 0.039047 0.790872 0.170080
5 0.861136 -1.017444 0.025147 ... 0.252618 0.733898 0.013483
... ... ... ... ... ... ... ...
2259 2.036009 -1.149760 -0.250998 ... 0.289274 0.710676 0.000051
2261 0.861136 -2.117915 -0.378399 ... 0.148251 0.849664 0.002085
2262 0.861136 -0.973017 0.093820 ... 0.187894 0.688689 0.123416
2263 0.957386 -1.166184 0.030280 ... 0.205622 0.747552 0.046825
2264 0.957386 -1.309326 -0.031210 ... 0.198499 0.767021 0.034480
[1906 rows x 7 columns]
Total running time of the script: (0 minutes 1.176 seconds)