Discrete mixture with panel data

Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

author:: Michel Bierlaire, EPFL
date:: Mon Apr 10 12:07:15 2023

import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme import models

from biogeme.expressions import (
    Beta,
    Variable,
    bioDraws,
    MonteCarlo,
    log,
    exp,
    bioMultSum,
    ExpressionOrNumeric,
)
from biogeme.parameters import Parameters

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    flat_database,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    INCOME,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16panel_discrete_socio_eco.py')

Example b16panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent class.

NUMBER_OF_CLASSES = 2
B_COST = [Beta(f'B_COST_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

B_TIME = [Beta(f'B_TIME_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

B_TIME_S = [
    Beta(f'B_TIME_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
B_TIME_RND: list[ExpressionOrNumeric] = [
    B_TIME[i] + B_TIME_S[i] * bioDraws(f'B_TIME_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

ASC_CAR = [
    Beta(f'ASC_CAR_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_S = [
    Beta(f'ASC_CAR_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_RND = [
    ASC_CAR[i] + ASC_CAR_S[i] * bioDraws(f'ASC_CAR_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

ASC_TRAIN = [
    Beta(f'ASC_TRAIN_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_S = [
    Beta(f'ASC_TRAIN_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_RND = [
    ASC_TRAIN[i] + ASC_TRAIN_S[i] * bioDraws(f'ASC_TRAIN_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

ASC_SM = [Beta(f'ASC_SM_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
ASC_SM_S = [
    Beta(f'ASC_SM_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_SM_RND = [
    ASC_SM[i] + ASC_SM_S[i] * bioDraws(f'ASC_SM_RND_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

CLASS_CTE = Beta('CLASS_CTE', 0, None, None, 0)
CLASS_INC = Beta('CLASS_INC', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

B_TIME_RND[0] = 0

Utility functions

V1 = [
    [
        ASC_TRAIN_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_TRAIN_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_TRAIN_COST_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V2 = [
    [
        ASC_SM_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_SM_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_SM_COST_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V3 = [
    [
        ASC_CAR_RND[i]
        + B_TIME_RND[i] * Variable(f'{t}_CAR_TT_SCALED')
        + B_COST[i] * Variable(f'{t}_CAR_CO_SCALED')
        for t in range(1, 10)
    ]
    for i in range(NUMBER_OF_CLASSES)
]
V = [
    [{1: V1[i][t], 2: V2[i][t], 3: V3[i][t]} for t in range(9)]
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

prob = [
    exp(
        bioMultSum(
            [models.loglogit(V[i][t], av, Variable(f'{t+1}_CHOICE')) for t in range(9)]
        )
    )
    for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

W = CLASS_CTE + CLASS_INC * INCOME
PROB_class0 = models.logit({0: W, 1: 0}, None, 0)
PROB_class1 = models.logit({0: W, 1: 0}, None, 1)

Conditional on the random variables, likelihood for the individual.

probIndiv = PROB_class0 * prob[0] + PROB_class1 * prob[1]

We integrate over the random variables using Monte-Carlo

logprob = log(MonteCarlo(probIndiv))

As the objective is to illustrate the syntax, we calculate the Monte-Carlo approximation with a small number of draws.

the_biogeme = bio.BIOGEME(flat_database, logprob, number_of_draws=100, seed=1223)
the_biogeme.modelName = 'b16panel_discrete_socio_eco'

Biogeme parameters read from biogeme.toml.

Estimate the parameters.

results = the_biogeme.estimate()

As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b16panel_discrete_socio_eco.iter
Cannot read file __b16panel_discrete_socio_eco.iter. Statement is ignored.
The number of draws (100) is low. The results may not be meaningful.
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter.     Function    Relgrad   Radius      Rho
      4e+03      0.034        1     0.44    +
    3.8e+03      0.024        1     0.63    +
    3.8e+03      0.024      0.5    0.052    -
    3.7e+03      0.029      0.5     0.52    +
    3.7e+03      0.028      0.5     0.39    +
    3.7e+03      0.032        5     0.97   ++
    3.7e+03      0.032      2.5       -2    -
    3.6e+03      0.018      2.5     0.51    +
    3.6e+03      0.018      1.2     -2.6    -
    3.6e+03      0.018     0.62     -2.7    -
    3.6e+03      0.018     0.31       -1    -
    3.6e+03      0.018     0.16    0.057    -
    3.6e+03     0.0069     0.16     0.59    +
    3.6e+03      0.011     0.16     0.19    +
    3.6e+03     0.0036     0.16     0.68    +
    3.6e+03     0.0073     0.16     0.58    +
    3.6e+03     0.0029     0.16     0.81    +
    3.6e+03     0.0055     0.16     0.65    +
    3.6e+03     0.0039     0.16     0.36    +
    3.6e+03     0.0045      1.6        1   ++
    3.6e+03     0.0045     0.78     -2.7    -
    3.6e+03     0.0045     0.39    -0.61    -
    3.6e+03     0.0099     0.39     0.15    +
    3.6e+03     0.0099      0.2   -0.024    -
    3.6e+03     0.0045      0.2     0.46    +
    3.6e+03     0.0038      0.2     0.24    +
    3.5e+03     0.0013      0.2     0.69    +
    3.5e+03     0.0025      0.2     0.39    +
    3.5e+03     0.0015        2     0.96   ++
    3.5e+03     0.0015     0.98      -25    -
    3.5e+03     0.0015     0.49      -14    -
    3.5e+03     0.0015     0.24       -5    -
    3.5e+03     0.0015     0.12     -2.3    -
    3.5e+03     0.0015    0.061     -1.6    -
    3.5e+03     0.0015    0.031    -0.38    -
    3.5e+03     0.0028    0.031     0.43    +
    3.5e+03     0.0028    0.015     0.02    -
    3.5e+03    0.00093    0.015     0.54    +
    3.5e+03    0.00092    0.015     0.88    +
    3.5e+03    0.00087    0.015     0.89    +
    3.5e+03    0.00091    0.015     0.89    +
    3.5e+03    0.00087     0.15     0.97   ++
    3.5e+03    0.00092      1.5     0.99   ++
    3.5e+03    0.00092     0.76    -0.67    -
    3.5e+03     0.0022      7.6      1.1   ++
    3.5e+03     0.0022      3.1      -26    -
    3.5e+03     0.0022      1.6      -12    -
    3.5e+03     0.0022     0.78     -9.8    -
    3.5e+03     0.0022     0.39     -2.9    -
    3.5e+03     0.0022      0.2    -0.81    -
    3.5e+03     0.0022    0.098    -0.17    -
    3.5e+03      0.002    0.098     0.63    +
    3.5e+03      0.003    0.098      0.4    +
    3.5e+03     0.0013    0.098     0.88    +
    3.5e+03     0.0024    0.098     0.73    +
    3.5e+03    0.00096    0.098      0.9    +
    3.5e+03     0.0013     0.98     0.93   ++
    3.5e+03     0.0014     0.98     0.49    +
    3.5e+03     0.0014     0.49       -4    -
    3.5e+03     0.0014     0.24       -2    -
    3.5e+03     0.0014     0.12     -1.3    -
    3.5e+03     0.0015     0.12     0.14    +
    3.5e+03     0.0013     0.12     0.43    +
    3.5e+03     0.0013    0.061    -0.83    -
    3.5e+03     0.0011    0.061     0.65    +
    3.5e+03     0.0011    0.031    -0.19    -
    3.5e+03    0.00053    0.031     0.44    +
    3.5e+03    0.00054    0.031     0.27    +
    3.5e+03     0.0004    0.031     0.84    +
    3.5e+03    0.00037    0.031     0.46    +
    3.5e+03    0.00044    0.031     0.26    +
    3.5e+03    0.00033    0.031     0.69    +
    3.5e+03    0.00033    0.015   -0.036    -
    3.5e+03    0.00034    0.015     0.51    +
    3.5e+03    0.00023    0.015     0.89    +
    3.5e+03    0.00025     0.15     0.92   ++
    3.5e+03    0.00027     0.15     0.77    +
    3.5e+03    0.00027    0.077     -2.9    -
    3.5e+03    0.00027    0.038     -1.8    -
    3.5e+03    0.00027    0.019    -0.59    -
    3.5e+03    0.00027   0.0096    0.083    -
    3.5e+03    0.00022   0.0096      0.8    +
    3.5e+03    0.00022   0.0048    -0.25    -
    3.5e+03    0.00021   0.0048     0.49    +
    3.5e+03    0.00013   0.0048     0.81    +
    3.5e+03    0.00013    0.048     0.96   ++
    3.5e+03    0.00013     0.48     0.96   ++
    3.5e+03    0.00013     0.16     -3.2    -
    3.5e+03    0.00013    0.079     -2.2    -
    3.5e+03    0.00013     0.04    -0.84    -
    3.5e+03     0.0001     0.04     0.37    -
Results saved in file b16panel_discrete_socio_eco.html
Results saved in file b16panel_discrete_socio_eco.pickle

print(results.short_summary())

Results for model b16panel_discrete_socio_eco
Nbr of parameters:              16
Sample size:                    752
Excluded data:                  0
Final log likelihood:           -3545.388
Akaike Information Criterion:   7122.776
Bayesian Information Criterion: 7196.74

pandas_results = results.get_estimated_parameters()
pandas_results

	Value	Rob. Std err	Rob. t-test	Rob. p-value
ASC_CAR_S_class0	7.897693	1.267046	6.233152	4.571428e-10
ASC_CAR_S_class1	2.770215	0.383546	7.222647	5.098144e-13
ASC_CAR_class0	-3.932036	0.778884	-5.048296	4.457687e-07
ASC_CAR_class1	0.928194	0.303702	3.056265	2.241133e-03
ASC_SM_S_class0	1.721648	0.641441	2.684031	7.274033e-03
ASC_SM_S_class1	0.787746	0.346705	2.272093	2.308088e-02
ASC_TRAIN_S_class0	2.551066	0.831508	3.068001	2.154959e-03
ASC_TRAIN_S_class1	2.114956	0.275435	7.678595	1.598721e-14
ASC_TRAIN_class0	-1.062234	0.784735	-1.353622	1.758571e-01
ASC_TRAIN_class1	-0.619074	0.328846	-1.882567	5.975908e-02
B_COST_class0	-1.923801	0.515813	-3.729646	1.917486e-04
B_COST_class1	-4.255385	0.319924	-13.301230	0.000000e+00
B_TIME_S_class1	2.165545	0.163520	13.243302	0.000000e+00
B_TIME_class1	-6.682184	0.478450	-13.966314	0.000000e+00
CLASS_CTE	-0.639407	0.382361	-1.672259	9.447322e-02
CLASS_INC	-0.260331	0.172839	-1.506204	1.320149e-01

Total running time of the script: (12 minutes 21.457 seconds)

Gallery generated by Sphinx-Gallery