Simulation of a choice model

We use an estimated model to perform various simulations.

Michel Bierlaire, EPFL Sat Jun 28 2025, 16:56:26

import sys
import time

import pandas as pd

from biogeme.biogeme import BIOGEME
from biogeme.calculator import get_value_c
from biogeme.data.optima import normalized_weight, read_data
from biogeme.models import nested
from biogeme.results_processing import EstimationResults
from scenarios import scenario

Obtain the specification for the default scenario. The definition of the scenarios is available in Specification of a nested logit model.

v, nests, _, _ = scenario()

v_pt = v[0]
v_car = v[1]
v_sm = v[2]

Obtain the expression for the choice probability of each alternative.

prob_pt = nested(v, None, nests, 0)
prob_car = nested(v, None, nests, 1)
prob_sm = nested(v, None, nests, 2)

# Read the estimation results from the file
try:
    results = EstimationResults.from_yaml_file(
        filename='saved_results/b02estimation.yaml'
    )
except FileNotFoundError:
    sys.exit(
        'Run first the script b02simulation.py '
        'in order to generate the '
        'file b02estimation.yaml.'
    )

Read the database

database = read_data()

We now simulate various expressions on the database, and store the results in a Pandas dataframe. %%

start_time = time.time()
simulate_formulas = {
    'weight': get_value_c(
        expression=normalized_weight,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Utility PT': get_value_c(
        expression=v_pt,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Utility car': get_value_c(
        expression=v_car,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Utility SM': get_value_c(
        expression=v_sm,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Prob. PT': get_value_c(
        expression=prob_pt,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Prob. car': get_value_c(
        expression=prob_car,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
    'Prob. SM': get_value_c(
        expression=prob_sm,
        betas=results.get_beta_values(),
        database=database,
        numerically_safe=False,
        use_jit=True,
    ),
}

simulated_values = pd.DataFrame.from_dict(simulate_formulas)
end_time = time.time()
print(
    f'--- Execution time without Biogeme:    '
    f'{end_time - start_time:.2f} seconds ---'
)
--- Execution time without Biogeme:    0.77 seconds ---

We now perform the same simulation using Biogeme. The results are identical, but the syntax is simpler and the execution time is a little bit faster. Indeed, Biogeme recycles calculations performed for one expression for the other expressions.

A dictionary with the requested expression must be provided to Biogeme

simulate = {
    'weight': normalized_weight,
    'Utility PT': v_pt,
    'Utility car': v_car,
    'Utility SM': v_sm,
    'Prob. PT': prob_pt,
    'Prob. car': prob_car,
    'Prob. SM': prob_sm,
}
start_time = time.time()
the_biogeme = BIOGEME(database, simulate)
the_betas = results.get_beta_values()
biogeme_simulation = the_biogeme.simulate(results.get_beta_values())
end_time = time.time()
print(
    f'--- Execution time with Biogeme:       '
    f'{time.time() - start_time:.2f} seconds ---'
)
--- Execution time with Biogeme:       0.62 seconds ---

Let’s print the two results, to show that they are identical

Without Biogeme

print(simulated_values)
        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
0     0.893779   -0.234985    -0.156370  ...  0.479431   0.519165  0.001404
1     0.868674   -0.442406     0.195938  ...  0.241046   0.560868  0.198086
2     0.868674   -2.021524    -0.048079  ...  0.119893   0.875040  0.005067
3     0.965766   -2.293563     0.027404  ...  0.051229   0.813507  0.135264
4     0.868674   -1.010973     0.008391  ...  0.259609   0.729616  0.010775
...        ...         ...          ...  ...       ...        ...       ...
1894  2.053830   -1.156962    -0.256143  ...  0.288859   0.711097  0.000044
1895  0.868674   -2.145229    -0.412661  ...  0.149688   0.849018  0.001294
1896  0.868674   -0.998305     0.065662  ...  0.205928   0.688197  0.105875
1897  0.965766   -1.145931     0.009557  ...  0.222025   0.742202  0.035772
1898  0.965766   -1.293037    -0.048959  ...  0.211584   0.763156  0.025259

[1899 rows x 7 columns]

With Biogeme

print(biogeme_simulation)
        weight  Utility PT  Utility car  ...  Prob. PT  Prob. car  Prob. SM
0     0.893779   -0.234985    -0.156370  ...  0.479431   0.519165  0.001404
1     0.868674   -0.442406     0.195938  ...  0.241046   0.560868  0.198086
2     0.868674   -2.021524    -0.048079  ...  0.119893   0.875040  0.005067
3     0.965766   -2.293563     0.027404  ...  0.051229   0.813507  0.135264
4     0.868674   -1.010973     0.008391  ...  0.259609   0.729616  0.010775
...        ...         ...          ...  ...       ...        ...       ...
1894  2.053830   -1.156962    -0.256143  ...  0.288859   0.711097  0.000044
1895  0.868674   -2.145229    -0.412661  ...  0.149688   0.849018  0.001294
1896  0.868674   -0.998305     0.065662  ...  0.205928   0.688197  0.105875
1897  0.965766   -1.145931     0.009557  ...  0.222025   0.742202  0.035772
1898  0.965766   -1.293037    -0.048959  ...  0.211584   0.763156  0.025259

[1899 rows x 7 columns]

Total running time of the script: (0 minutes 2.120 seconds)

Gallery generated by Sphinx-Gallery