Catalog for segmented parameters

Investigate the segmentations of parameters.

We consider 4 specifications for the constants:

  • Not segmented

  • Segmented by GA (yearly subscription to public transport)

  • Segmented by luggage

  • Segmented both by GA and luggage

We consider 3 specifications for the time coefficients:

  • Not Segmented

  • Segmented with first class

  • Segmented with trip purpose

We obtain a total of 12 specifications. See Bierlaire and Ortelli (2023).

author:

Michel Bierlaire, EPFL

date:

Thu Jul 13 16:18:10 2023

import numpy as np
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta
from biogeme.catalog import segmentation_catalogs
from biogeme.results import compile_estimation_results, pareto_optimal

See Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
)

Definition of the segmentations.

segmentation_ga = database.generate_segmentation(
    variable='GA', mapping={0: 'noGA', 1: 'GA'}
)

segmentation_luggage = database.generate_segmentation(
    variable='LUGGAGE', mapping={0: 'no_lugg', 1: 'one_lugg', 3: 'several_lugg'}
)

segmentation_first = database.generate_segmentation(
    variable='FIRST', mapping={0: '2nd_class', 1: '1st_class'}
)

We consider two trip purposes: ‘commuters’ and anything else. We need to define a binary variable first.

database.data['COMMUTERS'] = np.where(database.data['PURPOSE'] == 1, 1, 0)

segmentation_purpose = database.generate_segmentation(
    variable='COMMUTERS', mapping={0: 'non_commuters', 1: 'commuters'}
)

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

Catalogs for the alternative specific constants.

ASC_TRAIN_catalog, ASC_CAR_catalog = segmentation_catalogs(
    generic_name='ASC',
    beta_parameters=[ASC_TRAIN, ASC_CAR],
    potential_segmentations=(
        segmentation_ga,
        segmentation_luggage,
    ),
    maximum_number=2,
)

Catalog for the travel time coefficient. Note that the function returns a list of catalogs. Here, the list contains only one of them. This is why there is a comma after “B_TIME_catalog”.

(B_TIME_catalog,) = segmentation_catalogs(
    generic_name='B_TIME',
    beta_parameters=[B_TIME],
    potential_segmentations=(
        segmentation_first,
        segmentation_purpose,
    ),
    maximum_number=1,
)

Definition of the utility functions.

V1 = ASC_TRAIN_catalog + B_TIME_catalog * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V2 = B_TIME_catalog * SM_TT_SCALED + B_COST * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME_catalog * CAR_TT_SCALED + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

logprob = models.loglogit(V, av, CHOICE)

Create the Biogeme object.

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b04segmentation'
the_biogeme.generate_html = False
the_biogeme.generate_pickle = False

Estimate the parameters

dict_of_results = the_biogeme.estimate_catalog()

Number of estimated models.

print(f'A total of {len(dict_of_results)} models have been estimated')
A total of 12 models have been estimated

All estimation results

compiled_results, specs = compile_estimation_results(
    dict_of_results, use_short_names=True
)
compiled_results
Model_000000 Model_000001 Model_000002 Model_000003 Model_000004 Model_000005 Model_000006 Model_000007 Model_000008 Model_000009 Model_000010 Model_000011
Number of estimated parameters 7 7 5 10 4 5 6 11 9 11 9 8
Sample size 6768 6768 6768 6768 6768 6768 6768 6768 6768 6768 6768 6768
Final log likelihood -4976.118642 -5048.818199 -5331.250708 -5022.276564 -5331.252007 -5234.708233 -5050.677696 -5020.027091 -5160.079285 -4952.546476 -5240.921463 -5241.011928
Akaike Information Criterion 9966.237283 10111.636399 10672.501415 10064.553128 10670.504014 10479.416466 10113.355392 10062.054183 10338.158569 9927.092951 10499.842927 10498.023855
Bayesian Information Criterion 10013.97701 10159.376125 10706.60122 10132.752737 10697.783857 10513.51627 10154.275157 10137.073753 10399.538218 10002.112521 10561.222575 10552.583543
ASC_CAR (t-test) -0.281 (-4.53) -0.246 (-3.77) -0.155 (-2.53) -0.293 (-3.93) -0.155 (-2.66) -0.187 (-3.23) -0.249 (-3.97) -0.29 (-3.77) -0.24 (-3.36) -0.298 (-4.12) -0.237 (-3.13) -0.238 (-3.26)
ASC_CAR_GA (t-test) -0.231 (-1.19) -0.298 (-1.55) -0.291 (-1.49) -0.301 (-1.56) -0.287 (-1.48) -0.206 (-1.05)
ASC_TRAIN (t-test) -1.37 (-14.7) -1.28 (-13) -0.701 (-7.69) -1.75 (-15.1) -0.701 (-8.49) -0.814 (-9.45) -1.28 (-14) -1.74 (-14.6) -1.58 (-13.8) -1.79 (-15.4) -1.54 (-12.8) -1.54 (-13.5)
ASC_TRAIN_GA (t-test) 1.91 (21.5) 1.99 (22.6) 1.78 (19.4) 1.97 (22.3) 1.8 (19.6) 1.75 (19.1)
B_COST (t-test) -1.26 (-15.3) -1.1 (-14.9) -1.08 (-16) -1.1 (-14.8) -1.08 (-15.9) -1.23 (-16.6) -1.1 (-14.8) -1.1 (-14.8) -1.22 (-16.3) -1.25 (-15.3) -1.09 (-15.8) -1.09 (-15.7)
B_TIME (t-test) -0.621 (-4.46) -1.16 (-13.6) -1.28 (-15.1) -1.17 (-11.2) -1.28 (-12.3) -0.647 (-4.69) -1.18 (-11.3) -1.14 (-13.5) -0.656 (-4.64) -0.622 (-4.42) -1.24 (-14.6) -1.24 (-11.9)
B_TIME_1st_class (t-test) -0.914 (-8.6) -1.02 (-9.87) -0.943 (-8.88) -0.891 (-8.26)
B_TIME_commuters (t-test) -0.183 (-0.799) -0.00469 (-0.0222) -0.202 (-0.874) -0.0396 (-0.184)
ASC_CAR_one_lugg (t-test) 0.0744 (1.13) 0.0749 (1.14) 0.0616 (0.923) 0.0324 (0.486) 0.104 (1.57) 0.103 (1.56)
ASC_CAR_several_lugg (t-test) -0.252 (-1.06) -0.261 (-1.1) -0.432 (-1.83) -0.437 (-1.82) -0.252 (-1.07) -0.25 (-1.06)
ASC_TRAIN_one_lugg (t-test) 0.712 (7.23) 0.717 (7.3) 1.05 (11.1) 0.635 (6.4) 1.15 (12.3) 1.15 (12.3)
ASC_TRAIN_several_lugg (t-test) 0.593 (2.67) 0.584 (2.65) 0.799 (3.74) 0.431 (2) 0.976 (4.43) 0.978 (4.47)


Glossary

for short_name, spec in specs.items():
    print(f'{short_name}\t{spec}')
Model_000000    ASC:GA;B_TIME:FIRST
Model_000001    ASC:GA;B_TIME:COMMUTERS
Model_000002    ASC:no_seg;B_TIME:COMMUTERS
Model_000003    ASC:GA-LUGGAGE;B_TIME:no_seg
Model_000004    ASC:no_seg;B_TIME:no_seg
Model_000005    ASC:no_seg;B_TIME:FIRST
Model_000006    ASC:GA;B_TIME:no_seg
Model_000007    ASC:GA-LUGGAGE;B_TIME:COMMUTERS
Model_000008    ASC:LUGGAGE;B_TIME:FIRST
Model_000009    ASC:GA-LUGGAGE;B_TIME:FIRST
Model_000010    ASC:LUGGAGE;B_TIME:COMMUTERS
Model_000011    ASC:LUGGAGE;B_TIME:no_seg

Estimation results of the Pareto optimal models.

pareto_results = pareto_optimal(dict_of_results)
compiled_pareto_results, pareto_specs = compile_estimation_results(
    pareto_results, use_short_names=True
)
compiled_pareto_results
Model_000000 Model_000001 Model_000002 Model_000003 Model_000004
Number of estimated parameters 11 4 5 6 7
Sample size 6768 6768 6768 6768 6768
Final log likelihood -4952.546476 -5331.252007 -5234.708233 -5050.677696 -4976.118642
Akaike Information Criterion 9927.092951 10670.504014 10479.416466 10113.355392 9966.237283
Bayesian Information Criterion 10002.112521 10697.783857 10513.51627 10154.275157 10013.97701
ASC_CAR (t-test) -0.298 (-4.12) -0.155 (-2.66) -0.187 (-3.23) -0.249 (-3.97) -0.281 (-4.53)
ASC_CAR_GA (t-test) -0.206 (-1.05) -0.301 (-1.56) -0.231 (-1.19)
ASC_CAR_one_lugg (t-test) 0.0324 (0.486)
ASC_CAR_several_lugg (t-test) -0.437 (-1.82)
ASC_TRAIN (t-test) -1.79 (-15.4) -0.701 (-8.49) -0.814 (-9.45) -1.28 (-14) -1.37 (-14.7)
ASC_TRAIN_GA (t-test) 1.75 (19.1) 1.97 (22.3) 1.91 (21.5)
ASC_TRAIN_one_lugg (t-test) 0.635 (6.4)
ASC_TRAIN_several_lugg (t-test) 0.431 (2)
B_COST (t-test) -1.25 (-15.3) -1.08 (-15.9) -1.23 (-16.6) -1.1 (-14.8) -1.26 (-15.3)
B_TIME (t-test) -0.622 (-4.42) -1.28 (-12.3) -0.647 (-4.69) -1.18 (-11.3) -0.621 (-4.46)
B_TIME_1st_class (t-test) -0.891 (-8.26) -1.02 (-9.87) -0.914 (-8.6)


Glossary.

for short_name, spec in pareto_specs.items():
    print(f'{short_name}\t{spec}')
Model_000000    ASC:GA-LUGGAGE;B_TIME:FIRST
Model_000001    ASC:no_seg;B_TIME:no_seg
Model_000002    ASC:no_seg;B_TIME:FIRST
Model_000003    ASC:GA;B_TIME:no_seg
Model_000004    ASC:GA;B_TIME:FIRST

Total running time of the script: (0 minutes 2.647 seconds)

Gallery generated by Sphinx-Gallery