Note
Go to the end to download the full example code.
Estimation of several models
Example of the estimation of several specifications of the model.
- author:
Michel Bierlaire, EPFL
- date:
Mon Apr 10 12:19:46 2023
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, log
from biogeme.results import compile_estimation_results, pareto_optimal
from biogeme.catalog import Catalog, segmentation_catalogs
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
database,
CHOICE,
SM_AV,
CAR_AV_SP,
TRAIN_AV_SP,
TRAIN_TT_SCALED,
TRAIN_COST_SCALED,
SM_TT_SCALED,
SM_COST_SCALED,
CAR_TT_SCALED,
CAR_CO_SCALED,
MALE,
)
Parameters to be estimated
ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_CAR_MALE = Beta('ASC_CAR_MALE', 0, None, None, 0)
ASC_CAR_FEMALE = Beta('ASC_CAR_FEMALE', 0, None, None, 0)
ASC_TRAIN_MALE = Beta('ASC_TRAIN_MALE', 0, None, None, 0)
ASC_TRAIN_FEMALE = Beta('ASC_TRAIN_FEMALE', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)
segmentation_gender = database.generate_segmentation(
variable=MALE, mapping={0: 'female', 1: 'male'}
)
We define catalogs with two different specifications for the ASC_CAR: non segmented, and segmented.
ASC_TRAIN_catalog, ASC_CAR_catalog = segmentation_catalogs(
generic_name='ASC',
beta_parameters=[ASC_TRAIN, ASC_CAR],
potential_segmentations=(segmentation_gender,),
maximum_number=1,
)
We now define a catalog with the log travel time as well as the travel time.
First for train
train_tt_catalog = Catalog.from_dict(
catalog_name='train_tt_catalog',
dict_of_expressions={
'linear': TRAIN_TT_SCALED,
'log': log(TRAIN_TT_SCALED),
},
)
Then for SM. But we require that the specification is the same as train by defining the same controller.
sm_tt_catalog = Catalog.from_dict(
catalog_name='sm_tt_catalog',
dict_of_expressions={
'linear': SM_TT_SCALED,
'log': log(SM_TT_SCALED),
},
controlled_by=train_tt_catalog.controlled_by,
)
Definition of the utility functions with linear cost.
V1 = ASC_TRAIN_catalog + B_TIME * train_tt_catalog + B_COST * TRAIN_COST_SCALED
V2 = B_TIME * sm_tt_catalog + B_COST * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
V = {1: V1, 2: V2, 3: V3}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
logprob = models.loglogit(V, av, CHOICE)
the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b20multiple_models'
dict_of_results = the_biogeme.estimate_catalog()
print(f'A total of {len(dict_of_results)} models have been estimated:')
for config, res in dict_of_results.items():
print(f'{config}: LL={res.data.logLike:.2f} K={res.data.nparam}')
A total of 4 models have been estimated:
ASC:MALE;train_tt_catalog:log: LL=-5184.07 K=6
ASC:no_seg;train_tt_catalog:linear: LL=-5331.25 K=4
ASC:no_seg;train_tt_catalog:log: LL=-5350.59 K=4
ASC:MALE;train_tt_catalog:linear: LL=-5187.98 K=6
summary, description = compile_estimation_results(dict_of_results, use_short_names=True)
print(summary)
Model_000000 ... Model_000003
Number of estimated parameters 6 ... 6
Sample size 6768 ... 6768
Final log likelihood -5184.072742 ... -5187.98341
Akaike Information Criterion 10380.145485 ... 10387.96682
Bayesian Information Criterion 10421.06525 ... 10428.886585
ASC_CAR (t-test) 1.19 (7.3) ... -0.461 (-4.74)
ASC_CAR_male (t-test) 0.261 (2.56) ... 0.309 (3.04)
ASC_TRAIN (t-test) 0.183 (2.05) ... 0.0906 (0.992)
ASC_TRAIN_male (t-test) -1.35 (-17.1) ... -1.23 (-15.5)
B_COST (t-test) -1.07 (-16.1) ... -1.08 (-16.2)
B_TIME (t-test) -1.38 (-14.1) ... -1.25 (-11.8)
[11 rows x 4 columns]
Explanation of the names of the models.
for k, v in description.items():
if k != v:
print(f'{k}: {v}')
Model_000000: ASC:MALE;train_tt_catalog:log
Model_000001: ASC:no_seg;train_tt_catalog:linear
Model_000002: ASC:no_seg;train_tt_catalog:log
Model_000003: ASC:MALE;train_tt_catalog:linear
non_dominated_models = pareto_optimal(dict_of_results)
print(f'Out of them, {len(non_dominated_models)} are non dominated.')
for config, res in non_dominated_models.items():
print(f'{config}')
Out of them, 2 are non dominated.
ASC:no_seg;train_tt_catalog:linear
ASC:MALE;train_tt_catalog:log
summary, description = compile_estimation_results(
non_dominated_models, use_short_names=False
)
print(summary)
ASC:no_seg;train_tt_catalog:linear ASC:MALE;train_tt_catalog:log
Number of estimated parameters 4 6
Sample size 6768 6768
Final log likelihood -5331.252007 -5184.072742
Akaike Information Criterion 10670.504014 10380.145485
Bayesian Information Criterion 10697.783857 10421.06525
ASC_CAR (t-test) -0.155 (-2.66) 1.19 (7.3)
ASC_TRAIN (t-test) -0.701 (-8.49) 0.183 (2.05)
B_COST (t-test) -1.08 (-15.9) -1.07 (-16.1)
B_TIME (t-test) -1.28 (-12.3) -1.38 (-14.1)
ASC_CAR_male (t-test) 0.261 (2.56)
ASC_TRAIN_male (t-test) -1.35 (-17.1)
Total running time of the script: (0 minutes 0.721 seconds)