Estimation of several models

Example of the estimation of several specifications of the model.

author:: Michel Bierlaire, EPFL
date:: Mon Apr 10 12:19:46 2023

import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, log
from biogeme.results import compile_estimation_results, pareto_optimal
from biogeme.catalog import Catalog, segmentation_catalogs

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
    MALE,
)

Parameters to be estimated

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_CAR_MALE = Beta('ASC_CAR_MALE', 0, None, None, 0)
ASC_CAR_FEMALE = Beta('ASC_CAR_FEMALE', 0, None, None, 0)
ASC_TRAIN_MALE = Beta('ASC_TRAIN_MALE', 0, None, None, 0)
ASC_TRAIN_FEMALE = Beta('ASC_TRAIN_FEMALE', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

segmentation_gender = database.generate_segmentation(
    variable=MALE, mapping={0: 'female', 1: 'male'}
)

We define catalogs with two different specifications for the ASC_CAR: non segmented, and segmented.

ASC_TRAIN_catalog, ASC_CAR_catalog = segmentation_catalogs(
    generic_name='ASC',
    beta_parameters=[ASC_TRAIN, ASC_CAR],
    potential_segmentations=(segmentation_gender,),
    maximum_number=1,
)

We now define a catalog with the log travel time as well as the travel time.

First for train

train_tt_catalog = Catalog.from_dict(
    catalog_name='train_tt_catalog',
    dict_of_expressions={
        'linear': TRAIN_TT_SCALED,
        'log': log(TRAIN_TT_SCALED),
    },
)

Then for SM. But we require that the specification is the same as train by defining the same controller.

sm_tt_catalog = Catalog.from_dict(
    catalog_name='sm_tt_catalog',
    dict_of_expressions={
        'linear': SM_TT_SCALED,
        'log': log(SM_TT_SCALED),
    },
    controlled_by=train_tt_catalog.controlled_by,
)

Definition of the utility functions with linear cost.

V1 = ASC_TRAIN_catalog + B_TIME * train_tt_catalog + B_COST * TRAIN_COST_SCALED
V2 = B_TIME * sm_tt_catalog + B_COST * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

logprob = models.loglogit(V, av, CHOICE)

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b20multiple_models'

dict_of_results = the_biogeme.estimate_catalog()

print(f'A total of {len(dict_of_results)} models have been estimated:')
for config, res in dict_of_results.items():
    print(f'{config}: LL={res.data.logLike:.2f} K={res.data.nparam}')

A total of 4 models have been estimated:
ASC:MALE;train_tt_catalog:log: LL=-5184.07 K=6
ASC:no_seg;train_tt_catalog:linear: LL=-5331.25 K=4
ASC:no_seg;train_tt_catalog:log: LL=-5350.59 K=4
ASC:MALE;train_tt_catalog:linear: LL=-5187.98 K=6

summary, description = compile_estimation_results(dict_of_results, use_short_names=True)
print(summary)

                                  Model_000000  ...     Model_000003
Number of estimated parameters               6  ...                6
Sample size                               6768  ...             6768
Final log likelihood              -5184.072742  ...      -5187.98341
Akaike Information Criterion      10380.145485  ...      10387.96682
Bayesian Information Criterion     10421.06525  ...     10428.886585
ASC_CAR (t-test)                   1.19  (7.3)  ...  -0.461  (-4.74)
ASC_CAR_male (t-test)            0.261  (2.56)  ...    0.309  (3.04)
ASC_TRAIN (t-test)               0.183  (2.05)  ...  0.0906  (0.992)
ASC_TRAIN_male (t-test)         -1.35  (-17.1)  ...   -1.23  (-15.5)
B_COST (t-test)                 -1.07  (-16.1)  ...   -1.08  (-16.2)
B_TIME (t-test)                 -1.38  (-14.1)  ...   -1.25  (-11.8)

[11 rows x 4 columns]

Explanation of the names of the models.

for k, v in description.items():
    if k != v:
        print(f'{k}: {v}')

Model_000000: ASC:MALE;train_tt_catalog:log
Model_000001: ASC:no_seg;train_tt_catalog:linear
Model_000002: ASC:no_seg;train_tt_catalog:log
Model_000003: ASC:MALE;train_tt_catalog:linear

non_dominated_models = pareto_optimal(dict_of_results)
print(f'Out of them, {len(non_dominated_models)} are non dominated.')
for config, res in non_dominated_models.items():
    print(f'{config}')

Out of them, 2 are non dominated.
ASC:no_seg;train_tt_catalog:linear
ASC:MALE;train_tt_catalog:log

summary, description = compile_estimation_results(
    non_dominated_models, use_short_names=False
)
print(summary)

                               ASC:no_seg;train_tt_catalog:linear ASC:MALE;train_tt_catalog:log
Number of estimated parameters                                  4                             6
Sample size                                                  6768                          6768
Final log likelihood                                 -5331.252007                  -5184.072742
Akaike Information Criterion                         10670.504014                  10380.145485
Bayesian Information Criterion                       10697.783857                   10421.06525
ASC_CAR (t-test)                                  -0.155  (-2.66)                   1.19  (7.3)
ASC_TRAIN (t-test)                                -0.701  (-8.49)                 0.183  (2.05)
B_COST (t-test)                                    -1.08  (-15.9)                -1.07  (-16.1)
B_TIME (t-test)                                    -1.28  (-12.3)                -1.38  (-14.1)
ASC_CAR_male (t-test)                                                             0.261  (2.56)
ASC_TRAIN_male (t-test)                                                          -1.35  (-17.1)

Total running time of the script: (0 minutes 0.721 seconds)

Gallery generated by Sphinx-Gallery