20. Estimation of several models¶

Example of the estimation of several specifications of the model.

Michel Bierlaire, EPFL Thu Jun 26 2025, 16:04:27

from biogeme.biogeme import BIOGEME
from biogeme.catalog import (
    Catalog,
    segmentation_catalogs,
)
from biogeme.expressions import Beta, log
from biogeme.models import loglogit
from biogeme.results_processing import (
    compare_parameters,
    compile_estimation_results,
    pareto_optimal,
)

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    MALE,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

Parameters to be estimated

asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)

segmentation_gender = database.generate_segmentation(
    variable=MALE, mapping={0: 'female', 1: 'male'}
)

We define catalogs with two different specifications for the ASC_CAR: non segmented, and segmented.

asc_train_catalog, asc_car_catalog = segmentation_catalogs(
    generic_name='asc',
    beta_parameters=[asc_train, asc_car],
    potential_segmentations=(segmentation_gender,),
    maximum_number=1,
)

We now define a catalog with the log travel time as well as the travel time.

First for train

train_tt_catalog = Catalog.from_dict(
    catalog_name='train_tt_catalog',
    dict_of_expressions={
        'linear': TRAIN_TT_SCALED,
        'log': log(TRAIN_TT_SCALED),
    },
)

Then for SM. But we require that the specification is the same as train by defining the same controller.

sm_tt_catalog = Catalog.from_dict(
    catalog_name='sm_tt_catalog',
    dict_of_expressions={
        'linear': SM_TT_SCALED,
        'log': log(SM_TT_SCALED),
    },
    controlled_by=train_tt_catalog.controlled_by,
)

Definition of the utility functions with linear cost.

v_train = asc_train_catalog + b_time * train_tt_catalog + b_cost * TRAIN_COST_SCALED
v_swissmetro = b_time * sm_tt_catalog + b_cost * SM_COST_SCALED
v_car = asc_car_catalog + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 2: v_swissmetro, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

log_probability = loglogit(v, av, CHOICE)

the_biogeme: BIOGEME = BIOGEME(database=database, formulas=log_probability)
the_biogeme.model_name = 'b20multiple_models'

dict_of_results = the_biogeme.estimate_catalog()

print(f'A total of {len(dict_of_results)} models have been estimated:')
for config, res in dict_of_results.items():
    print(f'{config}: LL={res.final_log_likelihood:.2f} K={res.number_of_parameters}')

A total of 4 models have been estimated:
asc:no_seg;train_tt_catalog:linear: LL=-5331.25 K=4
asc:MALE;train_tt_catalog:linear: LL=-5187.98 K=6
asc:no_seg;train_tt_catalog:log: LL=-5350.59 K=4
asc:MALE;train_tt_catalog:log: LL=-5184.07 K=6

summary, description = compile_estimation_results(dict_of_results, use_short_names=True)
print(summary)

                                   Model_000000  ...    Model_000003
Number of estimated parameters                4  ...               6
Sample size                                6768  ...            6768
Final log likelihood                  -5331.252  ...       -5184.073
Akaike Information Criterion            10670.5  ...        10380.15
Bayesian Information Criterion         10697.78  ...        10421.07
asc_train (t-test)              -0.701  (-8.49)  ...
b_time (t-test)                  -1.28  (-12.3)  ...  -1.38  (-14.1)
b_cost (t-test)                  -1.08  (-15.9)  ...  -1.07  (-16.1)
asc_car (t-test)                -0.155  (-2.66)  ...
asc_train_ref (t-test)                           ...   0.183  (2.05)
asc_train_diff_male (t-test)                     ...  -1.35  (-17.1)
asc_car_ref (t-test)                             ...     1.19  (7.3)
asc_car_diff_male (t-test)                       ...   0.261  (2.56)

[13 rows x 4 columns]

Explanation of the names of the models.

for k, v in description.items():
    if k != v:
        print(f'{k}: {v}')

Model_000000: asc:no_seg;train_tt_catalog:linear
Model_000001: asc:MALE;train_tt_catalog:linear
Model_000002: asc:no_seg;train_tt_catalog:log
Model_000003: asc:MALE;train_tt_catalog:log

non_dominated_models = pareto_optimal(dict_of_results)
print(f'Out of them, {len(non_dominated_models)} are non dominated.')
for config, res in non_dominated_models.items():
    print(f'{config}')

Out of them, 2 are non dominated.
asc:no_seg;train_tt_catalog:linear
asc:MALE;train_tt_catalog:log

summary, description = compile_estimation_results(
    non_dominated_models, use_short_names=False
)
print(summary)

                               asc:no_seg;train_tt_catalog:linear asc:MALE;train_tt_catalog:log
Number of estimated parameters                                  4                             6
Sample size                                                  6768                          6768
Final log likelihood                                    -5331.252                     -5184.073
Akaike Information Criterion                              10670.5                      10380.15
Bayesian Information Criterion                           10697.78                      10421.07
asc_train (t-test)                                -0.701  (-8.49)
b_time (t-test)                                    -1.28  (-12.3)                -1.38  (-14.1)
b_cost (t-test)                                    -1.08  (-15.9)                -1.07  (-16.1)
asc_car (t-test)                                  -0.155  (-2.66)
asc_train_ref (t-test)                                                            0.183  (2.05)
asc_train_diff_male (t-test)                                                     -1.35  (-17.1)
asc_car_ref (t-test)                                                                1.19  (7.3)
asc_car_diff_male (t-test)                                                        0.261  (2.56)

It is possible to generate a LaTeX table comparing the results

latex_code = compare_parameters(estimation_results=dict_of_results)
print(latex_code)

\usepackage{longtable}
\usepackage{siunitx}
\sisetup{
  parse-numbers=false,      % Prevents automatic parsing (needed for parentheses & superscripts)
  detect-inline-weight=math,% Ensures proper formatting in tables
  tight-spacing=true        % Keeps spacing consistent
}
\begin{longtable}{rlSSSS}
& & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:linear} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:linear} & \multicolumn{1}{c}{asc:no_seg;train_tt_catalog:log} & \multicolumn{1}{c}{asc:MALE;train_tt_catalog:log} \\
 & Parameter name &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} &  \multicolumn{1}{c}{Coef./(SE)} \\
\hline
0 &asc\_car& -0.155\textsuperscript{***}  & & 1.43\textsuperscript{***}  &  \\
 & &(0.0582) & &(0.153) &  \\
1 &asc\_car\_diff\_male & & 0.309\textsuperscript{***}  & & 0.261\textsuperscript{**}  \\
 &  & &(0.102) & &(0.102) \\
2 &asc\_car\_ref & & -0.461\textsuperscript{***}  & & 1.19\textsuperscript{***}  \\
 &  & &(0.0973) & &(0.163) \\
3 &asc\_train& -0.701\textsuperscript{***}  & & -0.727\textsuperscript{***}  &  \\
 & &(0.0826) & &(0.0741) &  \\
4 &asc\_train\_diff\_male & & -1.23\textsuperscript{***}  & & -1.35\textsuperscript{***}  \\
 &  & &(0.0792) & &(0.0789) \\
5 &asc\_train\_ref & & 0.0906  & & 0.183\textsuperscript{**}  \\
 &  & &(0.0913) & &(0.089) \\
6 &b\_cost& -1.08\textsuperscript{***} & -1.08\textsuperscript{***} & -1.06\textsuperscript{***} & -1.07\textsuperscript{***}  \\
 & &(0.0682)&(0.067)&(0.067)&(0.0668) \\
7 &b\_time& -1.28\textsuperscript{***} & -1.25\textsuperscript{***} & -1.37\textsuperscript{***} & -1.38\textsuperscript{***}  \\
 & &(0.104)&(0.105)&(0.097)&(0.0981) \\
\hline
\multicolumn{2}{l}{Number of observations} &6768 & 6768 & 6768 & 6768 \\
\multicolumn{2}{l}{Number of parameters} &4 & 6 & 4 & 6 \\
\multicolumn{2}{l}{Akaike Information Criterion} &10670.5 & 10388.0 & 10709.2 & 10380.1 \\
\multicolumn{2}{l}{Bayesian Information Criterion} &10697.8 & 10428.9 & 10736.5 & 10421.1 \\
\hline
\multicolumn{4}{l}{\footnotesize Standard errors: \textsuperscript{***}: $p < 0.01$, \textsuperscript{**}: $p < 0.05$, \textsuperscript{*}: $p < 0.1$}
\end{longtable}

Total running time of the script: (0 minutes 3.516 seconds)

Gallery generated by Sphinx-Gallery