Note
Go to the end to download the full example code.
Estimation of several modelsΒΆ
Example of the estimation of several specifications of the model.
Michel Bierlaire, EPFL Thu Jun 26 2025, 16:04:27
from biogeme.biogeme import BIOGEME
from biogeme.catalog import (
Catalog,
segmentation_catalogs,
)
from biogeme.expressions import Beta, log
from biogeme.models import loglogit
from biogeme.results_processing import (
compare_parameters,
compile_estimation_results,
pareto_optimal,
)
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
MALE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
Parameters to be estimated
asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
segmentation_gender = database.generate_segmentation(
variable=MALE, mapping={0: 'female', 1: 'male'}
)
We define catalogs with two different specifications for the ASC_CAR: non segmented, and segmented.
asc_train_catalog, asc_car_catalog = segmentation_catalogs(
generic_name='asc',
beta_parameters=[asc_train, asc_car],
potential_segmentations=(segmentation_gender,),
maximum_number=1,
)
We now define a catalog with the log travel time as well as the travel time.
First for train
train_tt_catalog = Catalog.from_dict(
catalog_name='train_tt_catalog',
dict_of_expressions={
'linear': TRAIN_TT_SCALED,
'log': log(TRAIN_TT_SCALED),
},
)
Then for SM. But we require that the specification is the same as train by defining the same controller.
sm_tt_catalog = Catalog.from_dict(
catalog_name='sm_tt_catalog',
dict_of_expressions={
'linear': SM_TT_SCALED,
'log': log(SM_TT_SCALED),
},
controlled_by=train_tt_catalog.controlled_by,
)
Definition of the utility functions with linear cost.
v_train = asc_train_catalog + b_time * train_tt_catalog + b_cost * TRAIN_COST_SCALED
v_swissmetro = b_time * sm_tt_catalog + b_cost * SM_COST_SCALED
v_car = asc_car_catalog + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
log_probability = loglogit(v, av, CHOICE)
the_biogeme: BIOGEME = BIOGEME(database=database, formulas=log_probability)
the_biogeme.model_name = 'b20multiple_models'
dict_of_results = the_biogeme.estimate_catalog()
print(f'A total of {len(dict_of_results)} models have been estimated:')
for config, res in dict_of_results.items():
print(f'{config}: LL={res.final_log_likelihood:.2f} K={res.number_of_parameters}')
A total of 4 models have been estimated:
asc:no_seg;train_tt_catalog:linear: LL=-5331.25 K=4
asc:MALE;train_tt_catalog:log: LL=-7793.56 K=6
asc:no_seg;train_tt_catalog:log: LL=-6687.16 K=4
asc:MALE;train_tt_catalog:linear: LL=-5187.98 K=6
summary, description = compile_estimation_results(dict_of_results, use_short_names=True)
print(summary)
Model_000000 ... Model_000003
Number of estimated parameters 4 ... 6
Sample size 6768 ... 6768
Final log likelihood -5331.252 ... -5187.983
Akaike Information Criterion 10670.5 ... 10387.97
Bayesian Information Criterion 10697.78 ... 10428.89
asc_train (t-test) -0.701 (-8.49) ...
b_time (t-test) -1.28 (-12.3) ... -1.25 (-11.8)
b_cost (t-test) -1.08 (-15.9) ... -1.08 (-16.2)
asc_car (t-test) -0.155 (-2.66) ...
asc_train_ref (t-test) ... 0.0906 (0.992)
asc_train_diff_male (t-test) ... -1.23 (-15.5)
asc_car_ref (t-test) ... -0.461 (-4.74)
asc_car_diff_male (t-test) ... 0.309 (3.04)
[13 rows x 4 columns]
Explanation of the names of the models.
for k, v in description.items():
if k != v:
print(f'{k}: {v}')
Model_000000: asc:no_seg;train_tt_catalog:linear
Model_000001: asc:MALE;train_tt_catalog:log
Model_000002: asc:no_seg;train_tt_catalog:log
Model_000003: asc:MALE;train_tt_catalog:linear
non_dominated_models = pareto_optimal(dict_of_results)
print(f'Out of them, {len(non_dominated_models)} are non dominated.')
for config, res in non_dominated_models.items():
print(f'{config}')
Out of them, 2 are non dominated.
asc:no_seg;train_tt_catalog:linear
asc:MALE;train_tt_catalog:linear
summary, description = compile_estimation_results(
non_dominated_models, use_short_names=False
)
print(summary)
asc:no_seg;train_tt_catalog:linear asc:MALE;train_tt_catalog:linear
Number of estimated parameters 4 6
Sample size 6768 6768
Final log likelihood -5331.252 -5187.983
Akaike Information Criterion 10670.5 10387.97
Bayesian Information Criterion 10697.78 10428.89
asc_train (t-test) -0.701 (-8.49)
b_time (t-test) -1.28 (-12.3) -1.25 (-11.8)
b_cost (t-test) -1.08 (-15.9) -1.08 (-16.2)
asc_car (t-test) -0.155 (-2.66)
asc_train_ref (t-test) 0.0906 (0.992)
asc_train_diff_male (t-test) -1.23 (-15.5)
asc_car_ref (t-test) -0.461 (-4.74)
asc_car_diff_male (t-test) 0.309 (3.04)
It is possible to generate a LaTeX table comparing the results
latex_code = compare_parameters(estimation_results=dict_of_results)
print(latex_code)
\usepackage{longtable}
\usepackage{siunitx}
\sisetup{
parse-numbers=false, % Prevents automatic parsing (needed for parentheses & superscripts)
detect-inline-weight=math,% Ensures proper formatting in tables
tight-spacing=true % Keeps spacing consistent
}
\begin{longtable}{rlSSSS}
& & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:linear} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:log} & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:log} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:linear} \\
& Parameter name & \multicolumn{1}{c}{Coef./(SE)} & \multicolumn{1}{c}{Coef./(SE)} & \multicolumn{1}{c}{Coef./(SE)} & \multicolumn{1}{c}{Coef./(SE)} \\
\hline
0 &asc\_car& -0.155\textsuperscript{***} & & 0.0 & \\
& &(0.0582) & &(0.229) & \\
1 &asc\_car\_diff\_male & & -0.173 & & 0.309\textsuperscript{***} \\
& & &(0.354) & &(0.102) \\
2 &asc\_car\_ref & & 0.375 & & -0.461\textsuperscript{***} \\
& & &(0.533) & &(0.0973) \\
3 &asc\_train& -0.701\textsuperscript{***} & & 0.0 & \\
& &(0.0826) & &(0.104) & \\
4 &asc\_train\_diff\_male & & -0.973\textsuperscript{***} & & -1.23\textsuperscript{***} \\
& & &(0.112) & &(0.0792) \\
5 &asc\_train\_ref & & -0.248 & & 0.0906 \\
& & &(0.24) & &(0.0913) \\
6 &b\_cost& -1.08\textsuperscript{***} & -0.463\textsuperscript{***} & -1.07\textsuperscript{***} & -1.08\textsuperscript{***} \\
& &(0.0682)&(0.15)&(0.108)&(0.067) \\
7 &b\_time& -1.28\textsuperscript{***} & -2.15\textsuperscript{***} & -1.38\textsuperscript{***} & -1.25\textsuperscript{***} \\
& &(0.104)&(0.37)&(0.152)&(0.105) \\
\hline
\multicolumn{2}{l}{Number of observations} &6768 & 6768 & 6768 & 6768 \\
\multicolumn{2}{l}{Number of parameters} &4 & 6 & 4 & 6 \\
\multicolumn{2}{l}{Akaike Information Criterion} &10670.5 & 15599.1 & 13382.3 & 10388.0 \\
\multicolumn{2}{l}{Bayesian Information Criterion} &10697.8 & 15640.0 & 13409.6 & 10428.9 \\
\hline
\multicolumn{4}{l}{\footnotesize Standard errors: \textsuperscript{***}: $p < 0.01$, \textsuperscript{**}: $p < 0.05$, \textsuperscript{*}: $p < 0.1$}
\end{longtable}
Total running time of the script: (0 minutes 4.528 seconds)