Discrete mixture with panel data

Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

author:

Michel Bierlaire, EPFL

date:

Mon Apr 10 12:07:15 2023

import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme import models

from biogeme.expressions import (
    Beta,
    Variable,
    bioDraws,
    MonteCarlo,
    log,
    exp,
    bioMultSum,
)

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    flat_database,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    INCOME,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16panel_discrete_socio_eco.py')
Example b16panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent class.

NUMBER_OF_CLASSES = 2
B_COST = [Beta(f'B_COST_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

B_TIME = [Beta(f'B_TIME_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

B_TIME_S = [
    Beta(f'B_TIME_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
B_TIME_RND = [
    B_TIME[i] + B_TIME_S[i] * bioDraws(f'B_TIME_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

ASC_CAR = [
    Beta(f'ASC_CAR_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_S = [
    Beta(f'ASC_CAR_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_RND = [
    ASC_CAR[i] + ASC_CAR_S[i] * bioDraws(f'ASC_CAR_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

ASC_TRAIN = [
    Beta(f'ASC_TRAIN_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_S = [
    Beta(f'ASC_TRAIN_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_RND = [
    ASC_TRAIN[i] + ASC_TRAIN_S[i] * bioDraws(f'ASC_TRAIN_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

ASC_SM = [Beta(f'ASC_SM_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
ASC_SM_S = [
    Beta(f'ASC_SM_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_SM_RND = [
    ASC_SM[i] + ASC_SM_S[i] * bioDraws(f'ASC_SM_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

CLASS_CTE = Beta('CLASS_CTE', 0, None, None, 0)
CLASS_INC = Beta('CLASS_INC', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

B_TIME_RND[0] = 0

Utility functions

V1 = [
    [
        ASC_TRAIN_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_TRAIN_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_TRAIN_COST_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V2 = [
    [
        ASC_SM_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_SM_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_SM_COST_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V3 = [
    [
        ASC_CAR_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_CAR_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_CAR_CO_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V = [
    [{1: V1[i][t], 2: V2[i][t], 3: V3[i][t]} for t in range(9)]
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

prob = [
    exp(
        bioMultSum(
            [models.loglogit(V[i][t], av, Variable(f'{t+1}_CHOICE')) for t in range(9)]
        )
    )
    for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

W = CLASS_CTE + CLASS_INC * INCOME
PROB_class0 = models.logit({0: W, 1: 0}, None, 0)
PROB_class1 = models.logit({0: W, 1: 0}, None, 1)

Conditional on the random variables, likelihood for the individual.

probIndiv = PROB_class0 * prob[0] + PROB_class1 * prob[1]

We integrate over the random variables using Monte-Carlo

logprob = log(MonteCarlo(probIndiv))

Create the Biogeme object. As the objective is to illustrate the syntax, we calculate the Monte-Carlo approximation with a small number of draws. To achieve that, we provide a parameter file different from the default one.

the_biogeme = bio.BIOGEME(flat_database, logprob, parameter_file='few_draws.toml')
the_biogeme.modelName = 'b16panel_discrete_socio_eco'
File few_draws.toml has been parsed.

Estimate the parameters.

results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b16panel_discrete_socio_eco.iter
Cannot read file __b16panel_discrete_socio_eco.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.     Function    Relgrad   Radius      Rho
    0      4.1e+03      0.031       10     0.95   ++
    1      4.1e+03      0.031        5   -0.059    -
    2      3.7e+03      0.037        5     0.54    +
    3      3.7e+03      0.037      2.5     -1.8    -
    4      3.7e+03      0.037      1.2     -2.7    -
    5      3.7e+03      0.037     0.62    -0.88    -
    6      3.7e+03      0.035     0.62     0.68    +
    7      3.6e+03      0.006      6.2        1   ++
    8      3.6e+03      0.006      3.1     -1.7    -
    9      3.6e+03      0.006      1.6   -0.018    -
   10      3.6e+03     0.0081      1.6     0.63    +
   11      3.6e+03     0.0091      1.6     0.27    +
   12      3.6e+03     0.0091     0.78    -0.94    -
   13      3.6e+03     0.0091     0.39    -0.22    -
   14      3.6e+03      0.015     0.39     0.35    +
   15      3.6e+03     0.0032     0.39     0.88    +
   16      3.5e+03     0.0043      3.9      1.4   ++
   17      3.5e+03     0.0043        2     -4.1    -
   18      3.5e+03     0.0043     0.98       -2    -
   19      3.5e+03     0.0052     0.98     0.31    +
   20      3.5e+03    0.00056      9.8        1   ++
   21      3.5e+03    0.00077      9.8     0.75    +
   22      3.5e+03    9.4e-05       98      1.1   ++
   23      3.5e+03    5.8e-06       98        1   ++
Results saved in file b16panel_discrete_socio_eco.html
Results saved in file b16panel_discrete_socio_eco.pickle
print(results.short_summary())
Results for model b16panel_discrete_socio_eco
Nbr of parameters:              16
Sample size:                    752
Excluded data:                  0
Final log likelihood:           -3545.441
Akaike Information Criterion:   7122.881
Bayesian Information Criterion: 7196.845
pandas_results = results.getEstimatedParameters()
pandas_results
Value Rob. Std err Rob. t-test Rob. p-value
ASC_CAR_S_class0 8.101085 1.335196 6.067338 1.300476e-09
ASC_CAR_S_class1 2.208464 0.436902 5.054822 4.307931e-07
ASC_CAR_class0 -4.080267 0.761219 -5.360173 8.314207e-08
ASC_CAR_class1 0.991371 0.353991 2.800552 5.101529e-03
ASC_SM_S_class0 1.493494 0.613791 2.433230 1.496477e-02
ASC_SM_S_class1 1.960720 0.520392 3.767774 1.647097e-04
ASC_TRAIN_S_class0 2.649277 0.754258 3.512428 4.440320e-04
ASC_TRAIN_S_class1 2.032921 0.555649 3.658644 2.535530e-04
ASC_TRAIN_class0 -1.139183 0.629709 -1.809062 7.044130e-02
ASC_TRAIN_class1 -0.565211 0.435419 -1.298086 1.942579e-01
B_COST_class0 -1.785143 0.335799 -5.316112 1.060079e-07
B_COST_class1 -4.262454 0.323113 -13.191856 0.000000e+00
B_TIME_S_class1 2.153461 0.160134 13.447906 0.000000e+00
B_TIME_class1 -6.513136 0.449996 -14.473773 0.000000e+00
CLASS_CTE -0.790250 0.426295 -1.853764 6.377290e-02
CLASS_INC -0.237020 0.191202 -1.239633 2.151112e-01


Total running time of the script: (1 minutes 14.091 seconds)

Gallery generated by Sphinx-Gallery