Estimation of several modelsΒΆ

Example of the estimation of several specifications of the model.

Michel Bierlaire, EPFL Thu Jun 26 2025, 16:04:27

from biogeme.biogeme import BIOGEME
from biogeme.catalog import (
    Catalog,
    segmentation_catalogs,
)
from biogeme.expressions import Beta, log
from biogeme.models import loglogit
from biogeme.results_processing import (
    compare_parameters,
    compile_estimation_results,
    pareto_optimal,
)

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    MALE,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

Parameters to be estimated

asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
segmentation_gender = database.generate_segmentation(
    variable=MALE, mapping={0: 'female', 1: 'male'}
)

We define catalogs with two different specifications for the ASC_CAR: non segmented, and segmented.

asc_train_catalog, asc_car_catalog = segmentation_catalogs(
    generic_name='asc',
    beta_parameters=[asc_train, asc_car],
    potential_segmentations=(segmentation_gender,),
    maximum_number=1,
)

We now define a catalog with the log travel time as well as the travel time.

First for train

train_tt_catalog = Catalog.from_dict(
    catalog_name='train_tt_catalog',
    dict_of_expressions={
        'linear': TRAIN_TT_SCALED,
        'log': log(TRAIN_TT_SCALED),
    },
)

Then for SM. But we require that the specification is the same as train by defining the same controller.

sm_tt_catalog = Catalog.from_dict(
    catalog_name='sm_tt_catalog',
    dict_of_expressions={
        'linear': SM_TT_SCALED,
        'log': log(SM_TT_SCALED),
    },
    controlled_by=train_tt_catalog.controlled_by,
)

Definition of the utility functions with linear cost.

v_train = asc_train_catalog + b_time * train_tt_catalog + b_cost * TRAIN_COST_SCALED
v_swissmetro = b_time * sm_tt_catalog + b_cost * SM_COST_SCALED
v_car = asc_car_catalog + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 2: v_swissmetro, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

log_probability = loglogit(v, av, CHOICE)
the_biogeme: BIOGEME = BIOGEME(database=database, formulas=log_probability)
the_biogeme.model_name = 'b20multiple_models'
dict_of_results = the_biogeme.estimate_catalog()
print(f'A total of {len(dict_of_results)} models have been estimated:')
for config, res in dict_of_results.items():
    print(f'{config}: LL={res.final_log_likelihood:.2f} K={res.number_of_parameters}')
A total of 4 models have been estimated:
asc:no_seg;train_tt_catalog:linear: LL=-5331.25 K=4
asc:MALE;train_tt_catalog:log: LL=-7793.56 K=6
asc:no_seg;train_tt_catalog:log: LL=-6687.16 K=4
asc:MALE;train_tt_catalog:linear: LL=-5187.98 K=6
summary, description = compile_estimation_results(dict_of_results, use_short_names=True)
print(summary)
                                   Model_000000  ...     Model_000003
Number of estimated parameters                4  ...                6
Sample size                                6768  ...             6768
Final log likelihood                  -5331.252  ...        -5187.983
Akaike Information Criterion            10670.5  ...         10387.97
Bayesian Information Criterion         10697.78  ...         10428.89
asc_train (t-test)              -0.701  (-8.49)  ...
b_time (t-test)                  -1.28  (-12.3)  ...   -1.25  (-11.8)
b_cost (t-test)                  -1.08  (-15.9)  ...   -1.08  (-16.2)
asc_car (t-test)                -0.155  (-2.66)  ...
asc_train_ref (t-test)                           ...  0.0906  (0.992)
asc_train_diff_male (t-test)                     ...   -1.23  (-15.5)
asc_car_ref (t-test)                             ...  -0.461  (-4.74)
asc_car_diff_male (t-test)                       ...    0.309  (3.04)

[13 rows x 4 columns]

Explanation of the names of the models.

for k, v in description.items():
    if k != v:
        print(f'{k}: {v}')
Model_000000: asc:no_seg;train_tt_catalog:linear
Model_000001: asc:MALE;train_tt_catalog:log
Model_000002: asc:no_seg;train_tt_catalog:log
Model_000003: asc:MALE;train_tt_catalog:linear
non_dominated_models = pareto_optimal(dict_of_results)
print(f'Out of them, {len(non_dominated_models)} are non dominated.')
for config, res in non_dominated_models.items():
    print(f'{config}')
Out of them, 2 are non dominated.
asc:no_seg;train_tt_catalog:linear
asc:MALE;train_tt_catalog:linear
summary, description = compile_estimation_results(
    non_dominated_models, use_short_names=False
)
print(summary)
                               asc:no_seg;train_tt_catalog:linear asc:MALE;train_tt_catalog:linear
Number of estimated parameters                                  4                                6
Sample size                                                  6768                             6768
Final log likelihood                                    -5331.252                        -5187.983
Akaike Information Criterion                              10670.5                         10387.97
Bayesian Information Criterion                           10697.78                         10428.89
asc_train (t-test)                                -0.701  (-8.49)
b_time (t-test)                                    -1.28  (-12.3)                   -1.25  (-11.8)
b_cost (t-test)                                    -1.08  (-15.9)                   -1.08  (-16.2)
asc_car (t-test)                                  -0.155  (-2.66)
asc_train_ref (t-test)                                                             0.0906  (0.992)
asc_train_diff_male (t-test)                                                        -1.23  (-15.5)
asc_car_ref (t-test)                                                               -0.461  (-4.74)
asc_car_diff_male (t-test)                                                           0.309  (3.04)

It is possible to generate a LaTeX table comparing the results

latex_code = compare_parameters(estimation_results=dict_of_results)
print(latex_code)
\usepackage{longtable}
\usepackage{siunitx}
\sisetup{
  parse-numbers=false,      % Prevents automatic parsing (needed for parentheses & superscripts)
  detect-inline-weight=math,% Ensures proper formatting in tables
  tight-spacing=true        % Keeps spacing consistent
}
\begin{longtable}{rlSSSS}
& & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:linear} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:log} & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:log} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:linear} \\
 & Parameter name &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} \\
\hline
0 &asc\_car& -0.155\textsuperscript{***}  & & 0.0  &  \\
 & &(0.0582) & &(0.229) &  \\
1 &asc\_car\_diff\_male & & -0.173  & & 0.309\textsuperscript{***}  \\
 &  & &(0.354) & &(0.102) \\
2 &asc\_car\_ref & & 0.375  & & -0.461\textsuperscript{***}  \\
 &  & &(0.533) & &(0.0973) \\
3 &asc\_train& -0.701\textsuperscript{***}  & & 0.0  &  \\
 & &(0.0826) & &(0.104) &  \\
4 &asc\_train\_diff\_male & & -0.973\textsuperscript{***}  & & -1.23\textsuperscript{***}  \\
 &  & &(0.112) & &(0.0792) \\
5 &asc\_train\_ref & & -0.248  & & 0.0906  \\
 &  & &(0.24) & &(0.0913) \\
6 &b\_cost& -1.08\textsuperscript{***} & -0.463\textsuperscript{***} & -1.07\textsuperscript{***} & -1.08\textsuperscript{***}  \\
 & &(0.0682)&(0.15)&(0.108)&(0.067) \\
7 &b\_time& -1.28\textsuperscript{***} & -2.15\textsuperscript{***} & -1.38\textsuperscript{***} & -1.25\textsuperscript{***}  \\
 & &(0.104)&(0.37)&(0.152)&(0.105) \\
\hline
\multicolumn{2}{l}{Number of observations} &6768 & 6768 & 6768 & 6768 \\
\multicolumn{2}{l}{Number of parameters} &4 & 6 & 4 & 6 \\
\multicolumn{2}{l}{Akaike Information Criterion} &10670.5 & 15599.1 & 13382.3 & 10388.0 \\
\multicolumn{2}{l}{Bayesian Information Criterion} &10697.8 & 15640.0 & 13409.6 & 10428.9 \\
\hline
\multicolumn{4}{l}{\footnotesize Standard errors: \textsuperscript{***}: $p < 0.01$, \textsuperscript{**}: $p < 0.05$, \textsuperscript{*}: $p < 0.1$}
\end{longtable}

Total running time of the script: (0 minutes 4.528 seconds)

Gallery generated by Sphinx-Gallery