Note

Go to the end to download the full example code

Catalog for segmented parameters

Investigate the segmentations of parameters.

We consider 4 specifications for the constants:

Not segmented

Segmented by GA (yearly subscription to public transport)

Segmented by luggage

Segmented both by GA and luggage

We consider 3 specifications for the time coefficients:

Not Segmented

Segmented with first class

Segmented with trip purpose

We obtain a total of 12 specifications. See Bierlaire and Ortelli (2023).

author:: Michel Bierlaire, EPFL
date:: Thu Jul 13 16:18:10 2023

import numpy as np
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta
from biogeme.catalog import segmentation_catalogs
from biogeme.results import compile_estimation_results, pareto_optimal

See Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
)

Definition of the segmentations.

segmentation_ga = database.generate_segmentation(
    variable='GA', mapping={0: 'noGA', 1: 'GA'}
)

segmentation_luggage = database.generate_segmentation(
    variable='LUGGAGE', mapping={0: 'no_lugg', 1: 'one_lugg', 3: 'several_lugg'}
)

segmentation_first = database.generate_segmentation(
    variable='FIRST', mapping={0: '2nd_class', 1: '1st_class'}
)

We consider two trip purposes: ‘commuters’ and anything else. We need to define a binary variable first.

database.data['COMMUTERS'] = np.where(database.data['PURPOSE'] == 1, 1, 0)

segmentation_purpose = database.generate_segmentation(
    variable='COMMUTERS', mapping={0: 'non_commuters', 1: 'commuters'}
)

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

Catalogs for the alternative specific constants.

ASC_TRAIN_catalog, ASC_CAR_catalog = segmentation_catalogs(
    generic_name='ASC',
    beta_parameters=[ASC_TRAIN, ASC_CAR],
    potential_segmentations=(
        segmentation_ga,
        segmentation_luggage,
    ),
    maximum_number=2,
)

Catalog for the travel time coefficient. Note that the function returns a list of catalogs. Here, the list contains only one of them. This is why there is a comma after “B_TIME_catalog”.

(B_TIME_catalog,) = segmentation_catalogs(
    generic_name='B_TIME',
    beta_parameters=[B_TIME],
    potential_segmentations=(
        segmentation_first,
        segmentation_purpose,
    ),
    maximum_number=1,
)

Definition of the utility functions.

V1 = ASC_TRAIN_catalog + B_TIME_catalog * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V2 = B_TIME_catalog * SM_TT_SCALED + B_COST * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME_catalog * CAR_TT_SCALED + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

logprob = models.loglogit(V, av, CHOICE)

Create the Biogeme object.

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b04segmentation'
the_biogeme.generate_html = False
the_biogeme.generate_pickle = False

Estimate the parameters

dict_of_results = the_biogeme.estimate_catalog()

Number of estimated models.

print(f'A total of {len(dict_of_results)} models have been estimated')

A total of 12 models have been estimated

All estimation results

compiled_results, specs = compile_estimation_results(
    dict_of_results, use_short_names=True
)

compiled_results

	Model_000000	Model_000001	Model_000002	Model_000003	Model_000004	Model_000005	Model_000006	Model_000007	Model_000008	Model_000009	Model_000010	Model_000011
Number of estimated parameters	7	7	5	10	4	5	6	11	9	11	9	8
Sample size	6768	6768	6768	6768	6768	6768	6768	6768	6768	6768	6768	6768
Final log likelihood	-4976.118642	-5048.818199	-5331.250708	-5022.276564	-5331.252007	-5234.708233	-5050.677696	-5020.027091	-5160.079285	-4952.546476	-5240.921463	-5241.011928
Akaike Information Criterion	9966.237283	10111.636399	10672.501415	10064.553128	10670.504014	10479.416466	10113.355392	10062.054183	10338.158569	9927.092951	10499.842927	10498.023855
Bayesian Information Criterion	10013.97701	10159.376125	10706.60122	10132.752737	10697.783857	10513.51627	10154.275157	10137.073753	10399.538218	10002.112521	10561.222575	10552.583543
ASC_CAR (t-test)	-0.281 (-4.53)	-0.246 (-3.77)	-0.155 (-2.53)	-0.293 (-3.93)	-0.155 (-2.66)	-0.187 (-3.23)	-0.249 (-3.97)	-0.29 (-3.77)	-0.24 (-3.36)	-0.298 (-4.12)	-0.237 (-3.13)	-0.238 (-3.26)
ASC_CAR_GA (t-test)	-0.231 (-1.19)	-0.298 (-1.55)		-0.291 (-1.49)			-0.301 (-1.56)	-0.287 (-1.48)		-0.206 (-1.05)
ASC_TRAIN (t-test)	-1.37 (-14.7)	-1.28 (-13)	-0.701 (-7.69)	-1.75 (-15.1)	-0.701 (-8.49)	-0.814 (-9.45)	-1.28 (-14)	-1.74 (-14.6)	-1.58 (-13.8)	-1.79 (-15.4)	-1.54 (-12.8)	-1.54 (-13.5)
ASC_TRAIN_GA (t-test)	1.91 (21.5)	1.99 (22.6)		1.78 (19.4)			1.97 (22.3)	1.8 (19.6)		1.75 (19.1)
B_COST (t-test)	-1.26 (-15.3)	-1.1 (-14.9)	-1.08 (-16)	-1.1 (-14.8)	-1.08 (-15.9)	-1.23 (-16.6)	-1.1 (-14.8)	-1.1 (-14.8)	-1.22 (-16.3)	-1.25 (-15.3)	-1.09 (-15.8)	-1.09 (-15.7)
B_TIME (t-test)	-0.621 (-4.46)	-1.16 (-13.6)	-1.28 (-15.1)	-1.17 (-11.2)	-1.28 (-12.3)	-0.647 (-4.69)	-1.18 (-11.3)	-1.14 (-13.5)	-0.656 (-4.64)	-0.622 (-4.42)	-1.24 (-14.6)	-1.24 (-11.9)
B_TIME_1st_class (t-test)	-0.914 (-8.6)					-1.02 (-9.87)			-0.943 (-8.88)	-0.891 (-8.26)
B_TIME_commuters (t-test)		-0.183 (-0.799)	-0.00469 (-0.0222)					-0.202 (-0.874)			-0.0396 (-0.184)
ASC_CAR_one_lugg (t-test)				0.0744 (1.13)				0.0749 (1.14)	0.0616 (0.923)	0.0324 (0.486)	0.104 (1.57)	0.103 (1.56)
ASC_CAR_several_lugg (t-test)				-0.252 (-1.06)				-0.261 (-1.1)	-0.432 (-1.83)	-0.437 (-1.82)	-0.252 (-1.07)	-0.25 (-1.06)
ASC_TRAIN_one_lugg (t-test)				0.712 (7.23)				0.717 (7.3)	1.05 (11.1)	0.635 (6.4)	1.15 (12.3)	1.15 (12.3)
ASC_TRAIN_several_lugg (t-test)				0.593 (2.67)				0.584 (2.65)	0.799 (3.74)	0.431 (2)	0.976 (4.43)	0.978 (4.47)

Glossary

for short_name, spec in specs.items():
    print(f'{short_name}\t{spec}')

Model_000000    ASC:GA;B_TIME:FIRST
Model_000001    ASC:GA;B_TIME:COMMUTERS
Model_000002    ASC:no_seg;B_TIME:COMMUTERS
Model_000003    ASC:GA-LUGGAGE;B_TIME:no_seg
Model_000004    ASC:no_seg;B_TIME:no_seg
Model_000005    ASC:no_seg;B_TIME:FIRST
Model_000006    ASC:GA;B_TIME:no_seg
Model_000007    ASC:GA-LUGGAGE;B_TIME:COMMUTERS
Model_000008    ASC:LUGGAGE;B_TIME:FIRST
Model_000009    ASC:GA-LUGGAGE;B_TIME:FIRST
Model_000010    ASC:LUGGAGE;B_TIME:COMMUTERS
Model_000011    ASC:LUGGAGE;B_TIME:no_seg

Estimation results of the Pareto optimal models.

pareto_results = pareto_optimal(dict_of_results)
compiled_pareto_results, pareto_specs = compile_estimation_results(
    pareto_results, use_short_names=True
)

compiled_pareto_results

	Model_000000	Model_000001	Model_000002	Model_000003	Model_000004
Number of estimated parameters	11	4	5	6	7
Sample size	6768	6768	6768	6768	6768
Final log likelihood	-4952.546476	-5331.252007	-5234.708233	-5050.677696	-4976.118642
Akaike Information Criterion	9927.092951	10670.504014	10479.416466	10113.355392	9966.237283
Bayesian Information Criterion	10002.112521	10697.783857	10513.51627	10154.275157	10013.97701
ASC_CAR (t-test)	-0.298 (-4.12)	-0.155 (-2.66)	-0.187 (-3.23)	-0.249 (-3.97)	-0.281 (-4.53)
ASC_CAR_GA (t-test)	-0.206 (-1.05)			-0.301 (-1.56)	-0.231 (-1.19)
ASC_CAR_one_lugg (t-test)	0.0324 (0.486)
ASC_CAR_several_lugg (t-test)	-0.437 (-1.82)
ASC_TRAIN (t-test)	-1.79 (-15.4)	-0.701 (-8.49)	-0.814 (-9.45)	-1.28 (-14)	-1.37 (-14.7)
ASC_TRAIN_GA (t-test)	1.75 (19.1)			1.97 (22.3)	1.91 (21.5)
ASC_TRAIN_one_lugg (t-test)	0.635 (6.4)
ASC_TRAIN_several_lugg (t-test)	0.431 (2)
B_COST (t-test)	-1.25 (-15.3)	-1.08 (-15.9)	-1.23 (-16.6)	-1.1 (-14.8)	-1.26 (-15.3)
B_TIME (t-test)	-0.622 (-4.42)	-1.28 (-12.3)	-0.647 (-4.69)	-1.18 (-11.3)	-0.621 (-4.46)
B_TIME_1st_class (t-test)	-0.891 (-8.26)		-1.02 (-9.87)		-0.914 (-8.6)

Glossary.

for short_name, spec in pareto_specs.items():
    print(f'{short_name}\t{spec}')

Model_000000    ASC:GA-LUGGAGE;B_TIME:FIRST
Model_000001    ASC:no_seg;B_TIME:no_seg
Model_000002    ASC:no_seg;B_TIME:FIRST
Model_000003    ASC:GA;B_TIME:no_seg
Model_000004    ASC:GA;B_TIME:FIRST

Total running time of the script: (0 minutes 2.647 seconds)

Gallery generated by Sphinx-Gallery