Note
Go to the end to download the full example code.
Specification of a catalog of models
Specification of the catalogs used by the assisted specification algorithm. Note that this script does not perform any estimation. It is imported by other scripts: Assisted specification, Re-estimate the Pareto optimal models.
- author:
Michel Bierlaire, EPFL
- date:
Fri Jul 21 17:46:09 2023
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, logzero
from biogeme.catalog import Catalog, segmentation_catalogs
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
database,
CHOICE,
SM_AV,
CAR_AV_SP,
TRAIN_AV_SP,
TRAIN_TT_SCALED,
TRAIN_COST_SCALED,
SM_TT_SCALED,
SM_COST_SCALED,
CAR_TT_SCALED,
CAR_CO_SCALED,
MALE,
INCOME,
GA,
)
Parameters to be estimated.
ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)
Segmentations.
gender_segmentation = database.generate_segmentation(
variable=MALE,
mapping={
0: 'female',
1: 'male',
},
)
income_segmentation = database.generate_segmentation(
variable=INCOME,
mapping={
0: 'inc-zero',
1: 'inc-under50',
2: 'inc-50-100',
3: 'inc-100+',
4: 'inc-unknown',
},
)
print(f'{income_segmentation=}')
income_segmentation=INCOME: [{np.int64(0): 'inc-zero', np.int64(1): 'inc-under50', np.int64(2): 'inc-50-100', np.int64(3): 'inc-100+', np.int64(4): 'inc-unknown'}] ref: inc-zero
ga_segmentation = database.generate_segmentation(
variable=GA, mapping={1: 'GA', 0: 'noGA'}
)
asc_segmentations = (
gender_segmentation,
ga_segmentation,
)
ASC_CAR_catalog, ASC_TRAIN_catalog = segmentation_catalogs(
generic_name='ASC',
beta_parameters=[ASC_CAR, ASC_TRAIN],
potential_segmentations=asc_segmentations,
maximum_number=2,
)
cost_segmentations = (
ga_segmentation,
income_segmentation,
)
Note that the function returns a list. In this case, it contains only one element. This is the reason of the presence of the comma after B_COST_catalog
(B_COST_catalog,) = segmentation_catalogs(
generic_name='B_COST',
beta_parameters=[B_COST],
potential_segmentations=cost_segmentations,
maximum_number=1,
)
Parameter for Box-Cox transforms
ell_time = Beta('lambda_time', 1, None, 10, 0)
Potential non-linear specifications of travel time.
TRAIN_TT_catalog = Catalog.from_dict(
catalog_name='TRAIN_TT',
dict_of_expressions={
'linear': TRAIN_TT_SCALED,
'log': logzero(TRAIN_TT_SCALED),
'boxcox': models.boxcox(TRAIN_TT_SCALED, ell_time),
},
)
SM_TT_catalog = Catalog.from_dict(
catalog_name='SM_TT',
dict_of_expressions={
'linear': SM_TT_SCALED,
'log': logzero(SM_TT_SCALED),
'boxcox': models.boxcox(SM_TT_SCALED, ell_time),
},
controlled_by=TRAIN_TT_catalog.controlled_by,
)
CAR_TT_catalog = Catalog.from_dict(
catalog_name='CAR_TT',
dict_of_expressions={
'linear': CAR_TT_SCALED,
'log': logzero(CAR_TT_SCALED),
'boxcox': models.boxcox(CAR_TT_SCALED, ell_time),
},
controlled_by=TRAIN_TT_catalog.controlled_by,
)
Definition of the utility functions with linear cost.
V1 = ASC_TRAIN_catalog + B_TIME * TRAIN_TT_catalog + B_COST_catalog * TRAIN_COST_SCALED
V2 = B_TIME * SM_TT_catalog + B_COST_catalog * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME * CAR_TT_catalog + B_COST_catalog * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
V = {1: V1, 2: V2, 3: V3}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
logprob = models.loglogit(V, av, CHOICE)
print(
f'Total number of possible specifications: '
f'{logprob.number_of_multiple_expressions()}'
)
Total number of possible specifications: 36
Create the biogeme object.
the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b21multiple_models'
Name of the Pareto file.
PARETO_FILE_NAME = 'b21multiple_models.pareto'
Total running time of the script: (0 minutes 0.097 seconds)