Mixture of logit models

Example of a mixture of logit models, using numerical integration. The mixing distribution is uniform.

author:

Michel Bierlaire, EPFL

date:

Sun Apr 9 17:52:52 2023

import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import (
    Beta,
    Integrate,
    RandomVariable,
    exp,
    log,
)

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b06unif_mixture_integral.py')
Example b06unif_mixture_integral.py

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
B_COST = Beta('B_COST', 0, None, None, 0)

Define a random parameter, normally distributed, designed to be used for numerical integration

B_TIME = Beta('B_TIME', 0, None, None, 0)
B_TIME_S = Beta('B_TIME_S', 1, None, None, 0)
omega = RandomVariable('omega')

As the numerical integration ranges from -∞ to +∞, we need to perform a change of variable in order to integrate between -1 and 1.

LOWER_BND = -1
UPPER_BND = 1
x = LOWER_BND + (UPPER_BND - LOWER_BND) / (1 + exp(-omega))
dx = (UPPER_BND - LOWER_BND) * exp(-omega) * (1 + exp(-omega)) ** (-2)
B_TIME_RND = B_TIME + B_TIME_S * x

Definition of the utility functions.

V1 = ASC_TRAIN + B_TIME_RND * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V2 = ASC_SM + B_TIME_RND * SM_TT_SCALED + B_COST * SM_COST_SCALED
V3 = ASC_CAR + B_TIME_RND * CAR_TT_SCALED + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Conditional on omega, we have a logit model (called the kernel).

condprob = models.logit(V, av, CHOICE)

We integrate over omega using numerical integration.

logprob = log(Integrate(condprob * dx / (UPPER_BND - LOWER_BND), 'omega'))

Create the Biogeme object.

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = '06unif_mixture_integral'
File biogeme.toml has been parsed.

Estimate the parameters

results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __06unif_mixture_integral.iter
Cannot read file __06unif_mixture_integral.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR       ASC_TRAIN          B_COST          B_TIME        B_TIME_S     Function    Relgrad   Radius      Rho
    0           -0.18           -0.69           -0.37              -1            0.87      5.4e+03      0.045       10        1   ++
    1          0.0036           -0.56           -0.99            -1.6             1.9      5.2e+03      0.021    1e+02      1.1   ++
    2           0.095           -0.44            -1.2            -2.1             2.5      5.2e+03     0.0067    1e+03      1.2   ++
    3            0.14           -0.39            -1.3            -2.3             2.8      5.2e+03      0.001    1e+04      1.1   ++
    4            0.14           -0.39            -1.3            -2.3             2.9      5.2e+03    2.8e-05    1e+05        1   ++
    5            0.14           -0.39            -1.3            -2.3             2.9      5.2e+03    2.2e-08    1e+05        1   ++
Results saved in file 06unif_mixture_integral.html
Results saved in file 06unif_mixture_integral.pickle
print(results.short_summary())
Results for model 06unif_mixture_integral
Nbr of parameters:              5
Sample size:                    6768
Excluded data:                  3960
Final log likelihood:           -5215.072
Akaike Information Criterion:   10440.14
Bayesian Information Criterion: 10474.24
pandas_results = results.getEstimatedParameters()
pandas_results
Value Rob. Std err Rob. t-test Rob. p-value
ASC_CAR 0.144949 0.053305 2.719231 6.543384e-03
ASC_TRAIN -0.385033 0.066018 -5.832246 5.468613e-09
B_COST -1.277902 0.086617 -14.753425 0.000000e+00
B_TIME -2.320416 0.126068 -18.405991 0.000000e+00
B_TIME_S 2.875314 0.199949 14.380228 0.000000e+00


Total running time of the script: (0 minutes 11.420 seconds)

Gallery generated by Sphinx-Gallery