Note
Go to the end to download the full example code.
Catalog for segmented parameters¶
Investigate the segmentations of parameters.
We consider 4 specifications for the constants:
Not segmented
Segmented by GA (yearly subscription to public transport)
Segmented by luggage
Segmented both by GA and luggage
We consider 3 specifications for the time coefficients:
Not Segmented
Segmented with first class
Segmented with trip purpose
We obtain a total of 12 specifications. See Bierlaire and Ortelli (2023).
Michel Bierlaire, EPFL Sun Apr 27 2025, 15:52:48
import numpy as np
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.catalog import segmentation_catalogs
from biogeme.data.swissmetro import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
read_data,
)
from biogeme.expressions import Beta
from biogeme.models import loglogit
from biogeme.results_processing import compile_estimation_results, pareto_optimal
Read the data
database = read_data()
Definition of the segmentations.
segmentation_ga = database.generate_segmentation(
variable='GA', mapping={0: 'noGA', 1: 'GA'}
)
segmentation_luggage = database.generate_segmentation(
variable='LUGGAGE', mapping={0: 'no_lugg', 1: 'one_lugg', 3: 'several_lugg'}
)
segmentation_first = database.generate_segmentation(
variable='FIRST', mapping={0: '2nd_class', 1: '1st_class'}
)
We consider two trip purposes: ‘commuters’ and anything else. We need to define a binary variable first.
database.dataframe['COMMUTERS'] = np.where(database.dataframe['PURPOSE'] == 1, 1, 0)
segmentation_purpose = database.generate_segmentation(
variable='COMMUTERS', mapping={0: 'non_commuters', 1: 'commuters'}
)
Parameters to be estimated.
asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
Catalogs for the alternative specific constants.
asc_train_catalog, asc_car_catalog = segmentation_catalogs(
generic_name='asc',
beta_parameters=[asc_train, asc_car],
potential_segmentations=(
segmentation_ga,
segmentation_luggage,
),
maximum_number=2,
)
Catalog for the travel time coefficient. Note that the function returns a list of catalogs. Here, the list contains only one of them. This is why there is a comma after “B_TIME_catalog”.
(b_time_catalog,) = segmentation_catalogs(
generic_name='b_time',
beta_parameters=[b_time],
potential_segmentations=(
segmentation_first,
segmentation_purpose,
),
maximum_number=1,
)
Definition of the utility functions.
v_train = (
asc_train_catalog + b_time_catalog * TRAIN_TT_SCALED + b_cost * TRAIN_COST_SCALED
)
v_swissmetro = b_time_catalog * SM_TT_SCALED + b_cost * SM_COST_SCALED
v_car = asc_car_catalog + b_time_catalog * CAR_TT_SCALED + b_cost * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
log_probability = loglogit(v, av, CHOICE)
Create the Biogeme object.
the_biogeme = BIOGEME(
database, log_probability, generate_html=False, generate_yaml=False
)
the_biogeme.model_name = 'b04segmentation'
Estimate the parameters
dict_of_results = the_biogeme.estimate_catalog()
Number of estimated models.
print(f'A total of {len(dict_of_results)} models have been estimated')
A total of 12 models have been estimated
All estimation results
compiled_results, specs = compile_estimation_results(
dict_of_results, use_short_names=True
)
display('All estimated models')
display(compiled_results)
All estimated models
Model_000000 ... Model_000011
Number of estimated parameters 11 ... 7
Sample size 10719 ... 10719
Final log likelihood -8280.199 ... -8311.858
Akaike Information Criterion 16582.4 ... 16637.72
Bayesian Information Criterion 16662.47 ... 16688.68
asc_train_ref (t-test) -1.5 (-17.4) ... -1.12 (-17.3)
asc_train_diff_GA (t-test) 1.37 (19.2) ... 1.53 (22.3)
asc_train_diff_one_lugg (t-test) 0.562 (7.06) ...
asc_train_diff_several_lugg (t-test) 0.643 (3.72) ...
b_time_ref (t-test) -1.17 (-21.5) ... -1.18 (-21.6)
b_time_diff_commuters (t-test) -0.17 (-0.792) ... -0.168 (-0.784)
b_cost (t-test) -0.702 (-13.5) ... -0.702 (-13.3)
asc_car_ref (t-test) 0.03 (0.583) ... 0.0163 (0.396)
asc_car_diff_GA (t-test) -1.22 (-7.84) ... -1.26 (-8.18)
asc_car_diff_one_lugg (t-test) -0.0306 (-0.608) ...
asc_car_diff_several_lugg (t-test) -0.46 (-2.11) ...
asc_train (t-test) ...
b_time_diff_1st_class (t-test) ...
asc_car (t-test) ...
b_time (t-test) ...
[20 rows x 12 columns]
Glossary
for short_name, spec in specs.items():
print(f'{short_name}\t{spec}')
Model_000000 asc:GA-LUGGAGE;b_time:COMMUTERS
Model_000001 asc:no_seg;b_time:FIRST
Model_000002 asc:GA;b_time:FIRST
Model_000003 asc:LUGGAGE;b_time:FIRST
Model_000004 asc:GA-LUGGAGE;b_time:FIRST
Model_000005 asc:GA;b_time:no_seg
Model_000006 asc:LUGGAGE;b_time:no_seg
Model_000007 asc:no_seg;b_time:no_seg
Model_000008 asc:GA-LUGGAGE;b_time:no_seg
Model_000009 asc:LUGGAGE;b_time:COMMUTERS
Model_000010 asc:no_seg;b_time:COMMUTERS
Model_000011 asc:GA;b_time:COMMUTERS
Estimation results of the Pareto optimal models.
pareto_results = pareto_optimal(dict_of_results)
compiled_pareto_results, pareto_specs = compile_estimation_results(
pareto_results, use_short_names=True
)
display('Non dominated models')
display(compiled_pareto_results)
Non dominated models
Model_000000 ... Model_000004
Number of estimated parameters 4 ... 6
Sample size 10719 ... 10719
Final log likelihood -8670.163 ... -8313.613
Akaike Information Criterion 17348.33 ... 16639.23
Bayesian Information Criterion 17377.45 ... 16682.9
asc_train (t-test) -0.652 (-12) ...
b_time (t-test) -1.28 (-19.5) ... -1.19 (-18.3)
b_cost (t-test) -0.79 (-15.5) ... -0.704 (-13.3)
asc_car (t-test) 0.0162 (0.438) ...
asc_train_ref (t-test) ... -1.12 (-18.2)
asc_train_diff_GA (t-test) ... 1.52 (22.1)
asc_train_diff_one_lugg (t-test) ...
asc_train_diff_several_lugg (t-test) ...
b_time_ref (t-test) ...
b_time_diff_1st_class (t-test) ...
asc_car_ref (t-test) ... 0.0143 (0.361)
asc_car_diff_GA (t-test) ... -1.26 (-8.18)
asc_car_diff_one_lugg (t-test) ...
asc_car_diff_several_lugg (t-test) ...
[19 rows x 5 columns]
Glossary.
for short_name, spec in pareto_specs.items():
print(f'{short_name}\t{spec}')
Model_000000 asc:no_seg;b_time:no_seg
Model_000001 asc:GA-LUGGAGE;b_time:FIRST
Model_000002 asc:GA;b_time:FIRST
Model_000003 asc:no_seg;b_time:FIRST
Model_000004 asc:GA;b_time:no_seg
Total running time of the script: (0 minutes 14.706 seconds)