Specification of a catalog of models

Specification of the catalogs used by the assisted specification algorithm. Note that this script does not perform any estimation. It is imported by other scripts: Assisted specification, Re-estimate the Pareto optimal models.

author:: Michel Bierlaire, EPFL
date:: Fri Jul 21 17:46:09 2023

import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, logzero
from biogeme.catalog import Catalog, segmentation_catalogs

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
    MALE,
    INCOME,
    GA,
)

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

Segmentations.

gender_segmentation = database.generate_segmentation(
    variable=MALE,
    mapping={
        0: 'female',
        1: 'male',
    },
)

income_segmentation = database.generate_segmentation(
    variable=INCOME,
    mapping={
        0: 'inc-zero',
        1: 'inc-under50',
        2: 'inc-50-100',
        3: 'inc-100+',
        4: 'inc-unknown',
    },
)

print(f'{income_segmentation=}')

income_segmentation=INCOME: [{np.int64(0): 'inc-zero', np.int64(1): 'inc-under50', np.int64(2): 'inc-50-100', np.int64(3): 'inc-100+', np.int64(4): 'inc-unknown'}] ref: inc-zero

ga_segmentation = database.generate_segmentation(
    variable=GA, mapping={1: 'GA', 0: 'noGA'}
)

asc_segmentations = (
    gender_segmentation,
    ga_segmentation,
)

ASC_CAR_catalog, ASC_TRAIN_catalog = segmentation_catalogs(
    generic_name='ASC',
    beta_parameters=[ASC_CAR, ASC_TRAIN],
    potential_segmentations=asc_segmentations,
    maximum_number=2,
)

cost_segmentations = (
    ga_segmentation,
    income_segmentation,
)

Note that the function returns a list. In this case, it contains only one element. This is the reason of the presence of the comma after B_COST_catalog

(B_COST_catalog,) = segmentation_catalogs(
    generic_name='B_COST',
    beta_parameters=[B_COST],
    potential_segmentations=cost_segmentations,
    maximum_number=1,
)

Parameter for Box-Cox transforms

ell_time = Beta('lambda_time', 1, None, 10, 0)

Potential non-linear specifications of travel time.

TRAIN_TT_catalog = Catalog.from_dict(
    catalog_name='TRAIN_TT',
    dict_of_expressions={
        'linear': TRAIN_TT_SCALED,
        'log': logzero(TRAIN_TT_SCALED),
        'boxcox': models.boxcox(TRAIN_TT_SCALED, ell_time),
    },
)

SM_TT_catalog = Catalog.from_dict(
    catalog_name='SM_TT',
    dict_of_expressions={
        'linear': SM_TT_SCALED,
        'log': logzero(SM_TT_SCALED),
        'boxcox': models.boxcox(SM_TT_SCALED, ell_time),
    },
    controlled_by=TRAIN_TT_catalog.controlled_by,
)

CAR_TT_catalog = Catalog.from_dict(
    catalog_name='CAR_TT',
    dict_of_expressions={
        'linear': CAR_TT_SCALED,
        'log': logzero(CAR_TT_SCALED),
        'boxcox': models.boxcox(CAR_TT_SCALED, ell_time),
    },
    controlled_by=TRAIN_TT_catalog.controlled_by,
)

Definition of the utility functions with linear cost.

V1 = ASC_TRAIN_catalog + B_TIME * TRAIN_TT_catalog + B_COST_catalog * TRAIN_COST_SCALED
V2 = B_TIME * SM_TT_catalog + B_COST_catalog * SM_COST_SCALED
V3 = ASC_CAR_catalog + B_TIME * CAR_TT_catalog + B_COST_catalog * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

logprob = models.loglogit(V, av, CHOICE)

print(
    f'Total number of possible specifications: '
    f'{logprob.number_of_multiple_expressions()}'
)

Total number of possible specifications: 36

Create the biogeme object.

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b21multiple_models'

Name of the Pareto file.

PARETO_FILE_NAME = 'b21multiple_models.pareto'

Total running time of the script: (0 minutes 0.097 seconds)

Gallery generated by Sphinx-Gallery