16. Discrete mixture with panel dataΒΆ

Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

Michel Bierlaire, EPFL Mon Jun 23 2025, 16:29:45

import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
    Beta,
    Draws,
    ExpressionOrNumeric,
    MonteCarlo,
    PanelLikelihoodTrajectory,
    log,
)
from biogeme.models import logit
from biogeme.results_processing import (
    EstimationResults,
    get_pandas_estimated_parameters,
)

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    INCOME,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16_panel_discrete_socio_eco.py')
Example b16_panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent class.

NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

b_time_s = [
    Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd: list[ExpressionOrNumeric] = [
    b_time[i] + b_time_s[i] * Draws(f'b_time_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

asc_car = [
    Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
    Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
    asc_car[i] + asc_car_s[i] * Draws(f'asc_car_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_train = [
    Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
    Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
    asc_train[i] + asc_train_s[i] * Draws(f'asc_train_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
    Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
    asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

b_time_rnd[0] = 0

Utility functions.

v_train_per_class = [
    asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
    asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
    asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v = [
    {1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

choice_probability_per_class = [
    PanelLikelihoodTrajectory(logit(v[i], av, CHOICE)) for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

score_class_0 = class_cte + class_inc * INCOME
prob_class0 = logit({0: score_class_0, 1: 0}, None, 0)
prob_class1 = logit({0: score_class_0, 1: 0}, None, 1)

Conditional on the random variables, likelihood for the individual.

conditional_choice_probability = (
    prob_class0 * choice_probability_per_class[0]
    + prob_class1 * choice_probability_per_class[1]
)

We integrate over the random variables using Monte-Carlo

log_probability = log(MonteCarlo(conditional_choice_probability))

The model is complex, and there are numerical issues when calculating the second derivatives. Therefore, we instruct Biogeme not to evaluate the second derivatives. As a consequence, the statistics reported after estimation are based on the BHHH matrix instead of the Rao-Cramer bound.

the_biogeme = BIOGEME(
    database,
    log_probability,
    number_of_draws=5_000,
    seed=1223,
    calculating_second_derivatives='never',
)
the_biogeme.model_name = 'b16_panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.

Estimate the parameters.

try:
    results = EstimationResults.from_yaml_file(
        filename=f'saved_results/{the_biogeme.model_name}.yaml'
    )
except FileNotFoundError:
    results = the_biogeme.estimate()
Flattening database [(6768, 38)].
Database flattened [(752, 362)]
*** Initial values of the parameters are obtained from the file __b16_panel_discrete_socio_eco.iter
Cannot read file __b16_panel_discrete_socio_eco.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter.     Function    Relgrad   Radius      Rho
    0        4e+03      0.035        1     0.45    +
    1      3.9e+03      0.034        1      0.3    +
    2      3.7e+03      0.036        1     0.39    +
    3      3.6e+03      0.019        1      0.4    +
    4      3.6e+03       0.03        1     0.48    +
    5      3.6e+03      0.011        1     0.37    +
    6      3.6e+03      0.025        1     0.11    +
    7      3.5e+03     0.0068        1     0.59    +
    8      3.5e+03     0.0068      0.5     -1.8    -
    9      3.5e+03     0.0068     0.25    -0.71    -
   10      3.5e+03     0.0068     0.12    -0.22    -
   11      3.5e+03     0.0038     0.12     0.43    +
   12      3.5e+03     0.0068     0.12     0.82    +
   13      3.5e+03     0.0081     0.12     0.68    +
   14      3.5e+03     0.0029      1.2     0.95   ++
   15      3.5e+03     0.0029     0.62     -1.6    -
   16      3.5e+03     0.0029     0.31    -0.15    -
   17      3.5e+03     0.0033     0.31     0.51    +
   18      3.5e+03     0.0058     0.31     0.28    +
   19      3.5e+03     0.0017     0.31     0.25    +
   20      3.5e+03     0.0017     0.16    -0.84    -
   21      3.5e+03     0.0034     0.16      0.2    +
   22      3.5e+03     0.0015     0.16      0.8    +
   23      3.5e+03     0.0011     0.16     0.38    +
   24      3.5e+03     0.0013      1.6        1   ++
   25      3.5e+03     0.0013     0.78     -2.6    -
   26      3.5e+03     0.0013     0.39     -2.1    -
   27      3.5e+03     0.0013      0.2     -0.7    -
   28      3.5e+03     0.0013    0.098    -0.28    -
   29      3.5e+03     0.0018    0.098     0.23    +
   30      3.5e+03      0.001    0.098     0.84    +
   31      3.5e+03     0.0011    0.098     0.33    +
   32      3.5e+03    0.00047     0.98     0.97   ++
   33      3.5e+03     0.0033     0.98     0.72    +
   34      3.5e+03     0.0033     0.49    -0.21    -
   35      3.5e+03     0.0018     0.49     0.15    +
   36      3.5e+03     0.0018     0.24    -0.82    -
   37      3.5e+03     0.0016     0.24     0.67    +
   38      3.5e+03     0.0016     0.12     -1.3    -
   39      3.5e+03     0.0012     0.12     0.42    +
   40      3.5e+03     0.0012    0.061    -0.79    -
   41      3.5e+03     0.0022    0.061     0.31    +
   42      3.5e+03     0.0014    0.061     0.31    +
   43      3.5e+03    0.00047    0.061     0.62    +
   44      3.5e+03     0.0004    0.061     0.81    +
   45      3.5e+03     0.0004    0.031    -0.12    -
   46      3.5e+03     0.0004    0.015     -0.3    -
   47      3.5e+03    0.00053    0.015     0.19    +
   48      3.5e+03    0.00014     0.15     0.93   ++
   49      3.5e+03    0.00024     0.15     0.75    +
   50      3.5e+03    0.00024    0.076    -0.11    -
   51      3.5e+03    0.00017    0.076      0.3    +
   52      3.5e+03    0.00034    0.076     0.21    +
   53      3.5e+03    0.00014    0.076     0.26    +
   54      3.5e+03    0.00034    0.076     0.25    +
   55      3.5e+03    0.00034    0.076     0.22    +
   56      3.5e+03    0.00034    0.038    -0.42    -
   57      3.5e+03    0.00033    0.038     0.44    +
   58      3.5e+03    0.00019    0.038     0.63    +
   59      3.5e+03    4.7e-05     0.38     0.92   ++
   60      3.5e+03    5.9e-05      3.8      1.4   ++
   61      3.5e+03    9.6e-05      3.8     0.21    +
   62      3.5e+03    7.6e-05      3.8      0.8    +
   63      3.5e+03      4e-05      3.8     0.75    +
   64      3.5e+03    4.7e-05       38      1.7   ++
   65      3.5e+03    4.7e-05      0.3     -0.1    -
   66      3.5e+03    7.3e-05        3      1.3   ++
   67      3.5e+03    7.3e-05     0.33     -7.2    -
   68      3.5e+03    7.3e-05     0.16    -0.74    -
   69      3.5e+03    0.00016     0.16     0.67    +
   70      3.5e+03    7.4e-05     0.16     0.58    +
   71      3.5e+03    7.2e-05     0.16     0.38    +
   72      3.5e+03    5.2e-05     0.16     0.79    +
   73      3.5e+03    5.2e-05    0.033    -0.41    -
   74      3.5e+03    5.2e-05    0.016    -0.31    -
   75      3.5e+03    0.00011    0.016      0.6    +
   76      3.5e+03    3.6e-05    0.016     0.77    +
   77      3.5e+03    4.7e-06    0.016     0.64    +
Optimization algorithm has converged.
Relative gradient: 4.749699240944466e-06
Cause of termination: Relative gradient = 4.7e-06 <= 6.1e-06
Number of function evaluations: 189
Number of gradient evaluations: 111
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 78
Proportion of Hessian calculation: 0/55 = 0.0%
Optimization time: 0:02:38.021463
Calculate BHHH
File b16_panel_discrete_socio_eco.html has been generated.
File b16_panel_discrete_socio_eco.yaml has been generated.
print(results.short_summary())
Results for model b16_panel_discrete_socio_eco
Nbr of parameters:              16
Sample size:                    752
Observations:                   6768
Excluded data:                  0
Final log likelihood:           -3524.399
Akaike Information Criterion:   7080.798
Bayesian Information Criterion: 7154.762
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
                  Name     Value  BHHH std err.  BHHH t-stat.  BHHH p-value
0            class_cte -1.276124       0.483823     -2.637584  8.349885e-03
1            class_inc -0.201190       0.179175     -1.122870  2.614927e-01
2     asc_train_class0 -0.956264       0.677309     -1.411859  1.579914e-01
3   asc_train_s_class0  3.039756       0.646170      4.704268  2.547778e-06
4        b_cost_class0 -1.174641       0.549715     -2.136820  3.261266e-02
5      asc_sm_s_class0  0.078896       6.116803      0.012898  9.897090e-01
6       asc_car_class0 -4.871382       1.645420     -2.960571  3.070692e-03
7     asc_car_s_class0  6.157855       1.884002      3.268497  1.081203e-03
8     asc_train_class1 -0.351970       0.287517     -1.224170  2.208879e-01
9   asc_train_s_class1  1.754739       0.458400      3.827965  1.292071e-04
10       b_time_class1 -6.952903       0.365772    -19.008831  0.000000e+00
11     b_time_s_class1  3.277018       0.369304      8.873506  0.000000e+00
12       b_cost_class1 -4.747042       0.252347    -18.811558  0.000000e+00
13     asc_sm_s_class1  1.707763       0.332193      5.140873  2.734650e-07
14      asc_car_class1  1.013071       0.216166      4.686542  2.778594e-06
15    asc_car_s_class1  2.874308       0.261210     11.003840  0.000000e+00

Total running time of the script: (2 minutes 51.277 seconds)

Gallery generated by Sphinx-Gallery